Job Description

Primary Title: Site Reliability Engineer (SRE)

About The Opportunity

A fast-growing organization in the Cloud Infrastructure and Enterprise Software sector, delivering highly available, secure, and scalable platforms for customer-facing applications. We run production services at scale and are hiring an on-site Site Reliability Engineer in India to strengthen reliability, automation, and operational excellence across our systems.

Role & Responsibilities

  • Operate and improve production systems to meet defined SLAs/SLOs—monitor availability, performance, and capacity for critical services.
  • Design, implement, and maintain infrastructure as code for cloud environments to enable repeatable, secure deployments.
  • Build, maintain, and enhance CI/CD pipelines and deployment strategies (blue/green, canary, rolling) to reduce release risk and lead time.
  • Implement and own observability—metrics, logging, and tracing—plus alerting and runbooks to accelerate incident detection and resolution.
  • Automate operational runbooks and routine tasks using scripting and configuration management to reduce toil and improve reliability.
  • Lead incident response, postmortems, and root-cause analysis; collaborate with engineering teams to prevent recurrence and improve system design.

Skills & Qualifications

Must-Have

  • Kubernetes
  • Docker
  • Terraform
  • Prometheus
  • Grafana
  • AWS
  • Python
  • Jenkins

Preferred

  • Go
  • Ansible
  • GCP

Qualifications

  • Proven experience in site reliability, platform engineering, or DevOps supporting production services.
  • Strong troubleshooting skills with incident management and postmortem practice.
  • Must be available to work on-site in India and participate in on-call rotations.

Benefits & Culture Highlights

  • Hands-on ownership of reliability improvements and platform automation in a high-impact environment.
  • Collaborative engineering culture focused on learning, mentorship, and operational excellence.
  • Structured career growth paths for SREs and opportunities to work with modern cloud-native tooling.

Keywords: Site Reliability Engineer, SRE, observability, incident response, IaC, Kubernetes, Terraform, Prometheus, Grafana, AWS, Python, CI/CD, automation, on-call, production engineering.

Skills: site reliability engineer,prometheus,elasticsearch,jenkins,gcp,terraform,sre,docker,python,kubernetes,ansible,grafana,go,aws


Job Details

Role Level: Mid-Level Work Type: Full-Time
Country: India City: Hyderabad ,Telangana
Company Website: www.viraajhrsolutions.com Job Function: Engineering
Company Industry/
Sector:
Software Development

What We Offer


About the Company

Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.

Report

Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@talentmate.com.


Recent Jobs
View More Jobs
Talentmate Instagram Talentmate Facebook Talentmate YouTube Talentmate LinkedIn