Job Description

We are seeking an experienced Engineering Manager - SRE to lead our Site Reliability Engineering team. This role combines technical leadership, people management, and operational excellence to ensure our systems are reliable, scalable, secure, and highly available. You will drive reliability strategy, improve operational processes, and build a high-performing SRE team.

The Core Responsibilities For The Job Include The Following

Leadership and Team Management:

  • Build, mentor, and manage a high-performing SRE team.
  • Set clear goals, conduct performance reviews, and support career growth.
  • Foster a culture of reliability, automation, and continuous improvement.
  • Collaborate cross-functionally with engineering, product, security, and DevOps teams.

Reliability And Operations

  • Define and manage SLIs, SLOs, and error budgets.
  • Ensure system reliability, performance, scalability, and availability.
  • Lead incident management, root cause analysis (RCA), and postmortems.
  • Drive improvements in observability, monitoring, and alerting systems.

Infrastructure And Automation

  • Oversee cloud infrastructure (AWS/GCP/Azure) and on-prem environments.
  • Promote infrastructure-as-code (Terraform, CloudFormation, etc. ).
  • Drive automation to reduce toil and improve system efficiency.
  • Improve CI/CD pipelines and deployment reliability.

Strategy And Execution

  • Develop and execute an SRE roadmap aligned with business objectives.
  • Improve system resilience and disaster recovery processes.
  • Ensure compliance with security and regulatory requirements.
  • Track and report reliability metrics to leadership.

Requirements

  • 8+ years of experience in software engineering, DevOps, or SRE.
  • 2+ years of engineering management experience.
  • Strong expertise in cloud platforms (AWS/GCP/Azure).
  • Deep understanding of distributed systems and system architecture.
  • Experience with monitoring tools (Datadog, Prometheus, Grafana, New Relic, etc. ).
  • Proficiency in at least one programming language (Python, Go, Java, etc. ).
  • Experience with containerization and orchestration (Docker, Kubernetes).
  • Strong incident management and production operations experience.

This job was posted by Parvinder Kaur from Snapmint.


Job Details

Role Level: Mid-Level Work Type: Full-Time
Country: India City: Gurgaon ,Haryana
Company Website: https://snapmint.com/ Job Function: Engineering
Company Industry/
Sector:
Capital Markets Investment Advice And Financial Services

What We Offer


About the Company

Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.

Report

Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@talentmate.com.


Recent Jobs
View More Jobs
Talentmate Instagram Talentmate Facebook Talentmate YouTube Talentmate LinkedIn