Job Description

Job Description

Staff Machine Learning Engineering (Remote)

Cisco

Philadelphia, PA

  • On-site, Remote

Full-time

Medical, Dental, Vision, Life, Retirement, PTO

Posted 19 hours ago

Job Description

The application window is expected to close on: 02/28/2026

Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received.

This role can be performed remotely from locations within the United States.

Meet the Team

Splunk, a Cisco company, is building a safer, more resilient digital world with an endtoend, fullstack platform designed for hybrid, multicloud environments.

TheSplunkAI Platform and Services team provides the core runtime and developer experience thatpowerAI across Splunk and Cisco.We manage large-scale, multi-tenant LLM inference across major cloud providers and build platform services to support these workloads. We also provideVectorDB/RAG services and MCP services that make AI workloads secure, observable, and cost-efficient for product teams.

On top of this foundation, we deliver agentic frameworks, SDKs, tools, and evaluation/guardrail capabilities that help teams quickly build reliable GenAI assistants and automation features. Youll join a group that sits at the intersection of distributed systems, ML, and developer experience, grounded in operational excellence and a culture of impact-driven, cross-functional collaboration.

Your Impact

  • Lead the end-to-end architecture for key areas of the AI Platform: multi-tenant LLM serving (vLLM/Ray), routing and orchestration layers, VectorDB/RAG integration, and agentic/SDK surfaces used by product teams.
  • Design and drive implementation of high-scale inference services, including parallelism strategies (TP/PP/EP/MoE), autoscaling policies, and cross-region capacity management for GPU/CPU workloads.
  • Optimize latency, throughput, and cost for large-scale LLM and generative workloads using techniques such as batching, chunked prefills, caching, and mixed precision.
  • Design and tune distributed inference configurations (TP/PP/EP/MoE), across multi-GPU and multi-node clusters and modern GPU architectures.
  • Implement platform capabilities such as telemetry, metering & throttling, guardrails, and rollout/rollback to ensure AI services are safe, observable, and multi-tenant by default.
  • Lead the design of GenAI application services-chat assistants, and automation APIs, grounded in robust RAG pipelines, agentic workflows (LangChain/LangGraph or similar), and MCP-based tool ecosystems.
  • Drive operational excellence with runbooks, readiness checklists, CI/CD safeguards, on-call rotations, and post-incident improvements.
  • Provide technical mentorship and leadership for senior and mid-level engineers: review designs, guide trade-offs around quality/latency/COGS, and help grow the next generation of tech leads.
  • Collaborate closely with applied scientists to productionize new models and techniques, ensuring that research prototypes become robust, observable, and cost-efficient services.

Minimum Qualifications

  • Bachelors degree in computer science, Engineering, or equivalent practical experience.
  • 8+ years of hands-on experience building and operating backend or distributed systems in production or 5+ years of experience with a Masters degree, or 3+ years with a PhD
  • Proven track record as a technical lead for complex systems: driving architecture, aligning stakeholders, and delivering high-impact projects end-to-end.
  • Strong proficiency in at least one modern programming language (e.g., Python, Go, or Java) and deep experience with software design, debugging, and performance tuning.
  • Significant experience with cloud-native architectures (containers, Kubernetes, service discovery, configuration management, CI/CD) and building reliable microservices (REST/gRPC).
  • Demonstrated ownership of production services at scale, including on-call participation, incident response, and post-incident/RCAs that led to concrete improvements.

Preferred Qualifications

  • Hands-on experience running LLM or deep learning inference at scale using frameworks such as vLLM, TensorRT-LLM, Triton Inference Server, or similar.
  • Deep understanding of GPU and distributed systems performance: latency/throughput trade-offs, pipelining, model parallelism (TP/PP/EP/MoE), mixed precision (BF16/FP8/nvFP4), and profiling tools.
  • Experience designing and operating RAG systems and GenAI application layers: document ingestion, chunking/embedding strategies, metadata design, hybrid retrieval, context ranking, and evaluation of retrieval quality.
  • Practical experience with agentic frameworks (LangChain, LangGraph, LlamaIndex, Semantic Kernel, or similar) and multi-agent coordination, including integration with MCP tools and internal/external APIs.
  • Background building platform or Developer experiences capabilities-shared services, SDKs, templates, micro-frontends-that are adopted by multiple product teams.
  • Familiarity with LangSmith or similar evaluation platforms, including experiment design, offline/online evals, hallucination/groundedness metrics, and feedback loops.
  • Strong knowledge of AWS or Azure or GCP (EC2/VMs, IAM roles/ARNs/principals, VPC networking, security best practices) for AI workloads.
  • Experience defining and monitoring dashboards, and alerts for high-availability systems using Prometheus, Grafana, or cloud-native tooling.
  • Excellent communication and collaboration skills, comfortable influencing cross-functional partners and other senior engineers, and explaining trade-offs between quality, latency, and cost to both technical and non-technical audiences.

Why Cisco?

At Cisco, were revolutionizing how data and infrastructure connect and protect organizations in the AI era - and beyond. Weve been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint.

Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and youll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere.

We are Cisco, and our power starts with you.

Message to applicants applying to work in the U.S. and/or Canada:The starting salary range posted for this position is $193,800.00 to $245,300.00 and reflects the projected salary range for new hires in this position in U.S. and/or Canada locations, not including incentive compensation*, equity, or benefits.

Individual pay is determined by the candidates hiring location, market conditions, job-related skillset, experience, qualifications, education, certifications, and/or training. The full salary range for certain locations is listed below. For locations not listed below, the recruiter can share more details about compensation for the role in your location during the hiring process.

U.S. employees are offered benefits, subject to Ciscos plan eligibility rules, which include medical, dental and vision insurance, a 401(k) plan with a Cisco matching contribution, paid parental leave, short and long-term disability coverage, and basic life insurance. Please see the Cisco careers site to discover more benefits and perks. Employees may be eligible to receive grants of Cisco restricted stock units, which vest following continued employment with Cisco for defined periods of time.

The applicable full salary ranges for this position, by specific state, are listed below:

New York City Metro Area: $212,300.00 - $317,100.00

Non-Metro New York state & Washington state: $193,800.00 - $282,100.00


Job Details

Role Level: Mid-Level Work Type: Full-Time
Country: United Arab Emirates City: Dubai
Company Website: https://www.chatgpt-jobs.com Job Function: Information Technology (IT)
Company Industry/
Sector:
Technology Information and Internet

What We Offer


About the Company

Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.

Report

Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@talentmate.com.


Recent Jobs
View More Jobs
Talentmate Instagram Talentmate Facebook Talentmate YouTube Talentmate LinkedIn