Job Description

Senior Software Engineer / Site Reliability Engineer (SRE) – Observability & Platform Engineering1

Must-Have Skills (Required)

Core Engineering & Platform Skills

  • Strong proficiency in at least one of the following: Python, JavaScript (Node.js), or Java
  • Hands-on experience with API integrations (designing, consuming, and integrating APIs)
  • Strong experience working in Kubernetes environments, including deployment, operations, and monitoring

Observability & Monitoring

  • Experience with DataDog (preferred) or similar tools such as Prometheus, Grafana
  • Ability to configure dashboards, alerts, and APM (tracing, metrics, logging)
  • Experience monitoring containerized and microservices architectures

Cloud & Infrastructure

  • Hands-on experience with AWS
  • Experience integrating observability tools into cloud environments

SRE & Operations

  • Experience with CI/CD integrations for observability (e.g., DataDog in pipelines)
  • Ability to automate monitoring and operational tasks using scripting (Python preferred)

Strongly Preferred Skills

  • Experience owning and operating an internal engineering platform
  • Deep experience with observability platforms
  • Demonstrated ownership of reliability, scalability, and performance
  • Proven ability to proactively lead maintenance efforts and platform improvements
  • Experience installing and configuring DataDog agents and integrations
  • Experience managing API keys and secure configurations
  • Experience managing user roles and access controls within observability platforms

Nice-to-Have Skills (Preferred)

  • Familiarity with Go (Golang)
  • Experience with additional observability tools such as New Relic, Dynatrace, Elastic, or Splunk Observability

Description

Project Overview:

We are seeking a Senior Software Engineer / SRE with an Observability focus to support platform reliability, monitoring, and modernization initiatives. This role blends software engineering (60–70%) with site reliability engineering (30–40%), with a strong emphasis on Kubernetes and observability platforms.

Key Responsibilities

  • Support platform reliability, monitoring, and modernization initiatives
  • Provide operational and training support for DataDog, the Observability Platform for R&D
  • Enhance observability, reliability, and performance across engineering platforms
  • Drive automation and operational excellence for monitoring and alerting frameworks
  • Support Kubernetes-based platform operations and monitoring integrations

Timezone Coverage

  • PST Coverage Required


Job Details

Role Level: Not Applicable Work Type: Full-Time
Country: India City: Hyderabad ,Telangana
Company Website: https://www.jadeglobal.com Job Function: DevOps & QA
Company Industry/
Sector:
IT Services and IT Consulting

What We Offer


About the Company

Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.

Report

Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@talentmate.com.


Recent Jobs
View More Jobs
Talentmate Instagram Talentmate Facebook Talentmate YouTube Talentmate LinkedIn