Job Description

Job Description – On-Prem Infrastructure Engineer / SRE

Location: Pan India

Experience: 5–10 Years

Role: On-Prem Infrastructure Engineer / Site Reliability Engineer (SRE)

Job Summary

We are seeking a skilled On-Prem Infrastructure Engineer / SRE to manage and support NVIDIA’s on-prem engineering cloud infrastructure across multiple data centers. The ideal candidate will have strong experience in bare-metal infrastructure management, observability tools, automation, and production support. This role is critical in ensuring uptime, reliability, and operational excellence for engineering services.

Key Responsibilities

  • On-Prem Infrastructure Management

Manage and operate NVIDIA’s on-prem infrastructure across distributed data centers.

Maintain high availability, reliability, and readiness of on-prem engineering cloud environments.

Perform lifecycle management of bare-metal servers and underlying hardware.

  • Service Level Management

Guard and maintain Service Level Agreements (SLAs) for mission-critical engineering services.

Implement and maintain monitoring, alerting, and incident response workflows.

Drive root cause analysis (RCA), conduct post-mortems, and ensure corrective and preventive actions.

  • Observability & Monitoring

Deploy, configure, and manage observability tools such as Prometheus, Grafana, ELK Stack.

Maintain KPI monitoring pipelines using Jenkins, Python, and ELK.

Develop and enhance custom monitoring dashboards and business-specific alerting rules.

  • Automation & Optimization

Contribute to capacity planning, resource optimization, and performance tuning initiatives.

Develop automation scripts/tools using Python, Go, Bash, or Jenkins pipelines.

Improve operational efficiency through continuous automation.

  • Day-to-Day Operations & Support

Monitor system alerts, troubleshoot incidents, and resolve user-reported issues.

Participate in WAR rooms during major or high-impact incidents.

Ensure timely escalation and resolution of production issues.

  • Collaboration & Documentation

Create and maintain technical documentation for operational procedures, architectures, and troubleshooting steps.

Work closely with engineering, DevOps, hardware, and data center teams to improve overall infrastructure reliability.

Required Skills & Experience

Strong hands-on experience in bare-metal server management using tools such as:

IPMI, Redfish, KVM or similar technologies.

Experience With Automation And Scripting Using

Python, Go, Bash, Jenkins (CI/CD pipelines).

Practical Experience With Infrastructure Tools

Kubernetes, MySQL, Prometheus, Grafana, ELK (Elasticsearch, Logstash, Kibana).

Solid understanding of system performance, capacity planning, and datacenter operations.

Strong troubleshooting, incident-response, and operational debugging skills.

Ability to work in fast-paced environments and handle production-critical scenarios.

Nice-to-Have Skills

Familiarity with NVIDIA hardware: GPUs, Tegra systems, DGX platforms, etc.

Experience in large-scale distributed systems or high-performance computing environments.

Soft Skills

Strong communication and collaboration abilities.

Analytical mindset with a focus on problem-solving.

Ability to maintain composure under pressure in incident environments.

Detail-oriented with strong documentation habits.ocumentation habits.


Job Details

Role Level: Mid-Level Work Type: Full-Time
Country: India City: Pimpri Chinchwad ,Maharashtra
Company Website: http://www.natobotics.com Job Function: Information Technology (IT)
Company Industry/
Sector:
Information Technology and Services

What We Offer


About the Company

Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.

Report

Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@talentmate.com.


Recent Jobs
View More Jobs
Talentmate Instagram Talentmate Facebook Talentmate YouTube Talentmate LinkedIn