Job Description

Overview

We are seeking a self-driven, inquisitive, and curious Site Reliability Engineer (SRE) to drive reliability, availability, performance, and security across our global digital product ecosystem. This role is central to ensuring a seamless and resilient experience for our users by blending deep engineering expertise with operational excellence and automation.

You will be part of a global SRE practice supporting a portfolio of 260+ modern cloud-native applications across consumer, commercial, supply chain, and enablement functions. Your mission: prevent incidents before they occur, ensure rapid recovery when they do, and build scalable systems that evolve with our growing business.

Responsibilities

Champion reliability, observability, and operational excellence across mission-critical applications.

  • Develop and maintain service-level indicators (SLIs), objectives (SLOs), and error budgets to measure and improve system performance.
  • Implement automated monitoring, alerting, and recovery mechanisms to reduce manual intervention and improve response times.
  • Collaborate closely with software engineering, platform, and operations teams to embed SRE practices across the development lifecycle.
  • Lead and participate in incident response, root cause analysis, and postmortem reviews to drive long-term improvements.
  • Identify and eliminate sources of toil through automation, tooling, and process refinement.
  • Continuously improve resiliency design, capacity planning, and release management in production systems.
  • Influence engineering teams with best practices on cloud-native architecture, observability, and deployment strategies.

Qualifications

Required Skills:

  • 5+ years of experience in production engineering, DevOps, or SRE roles.
  • Strong foundation in Linux systems, networking, and cloud platforms (Azure, AWS, or GCP).
  • Hands-on experience with observability tools (e.g., AppDynamics, Prometheus, Grafana, ELK, FullStory).
  • Proficiency in scripting or programming (e.g., Python, Bash, Go) and automation frameworks (e.g., Ansible, Terraform).
  • Deep understanding of CI/CD pipelines, release strategies, and deployment automation.
  • Experience in managing high-scale, distributed systems in cloud-native environments.
  • Strong analytical skills and a passion for continuous improvement.

Preferred Skills:

  • Familiarity with microservices, Kubernetes, containers, and service mesh architecture.
  • Exposure to incident and problem management frameworks (e.g., ITIL, RCA practices).
  • Experience working in global teams supporting mission-critical applications.


Job Details

Role Level: Mid-Level Work Type: Full-Time
Country: India City: Hyderabad ,Telangana
Company Website: http://www.pepsico.com Job Function: Information Technology (IT)
Company Industry/
Sector:
Manufacturing And Food And Beverage Services

What We Offer


About the Company

Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.

Report

Similar Jobs

Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@talentmate.com.


Talentmate Instagram Talentmate Facebook Talentmate YouTube Talentmate LinkedIn