Job Description

Key Responsibilities

  • Respond to and resolve operational incidents, identify root causes for critical issues, and implement strategies to prevent recurrence and improve platform resiliency.
  • Proactively create and manage monitoring, logging, and alerting systems to ensure high availability, performance, and visibility across all services.
  • Take a Site Reliability Engineering approach to our services, improving the deployment, monitoring and incident response end-to-end.
  • Solve complex technical problems, with SCP applications, infrastructure and end user’s use of the services.
  • Administer platform tools like Ansible, Vault, Consul, Prometheus, and Grafana to support core functions like configuration management, secrets management, monitoring, and observability.
  • Mentor and coach junior engineers in the team, fostering a collaborative and high-performing culture.
  • Drive automation for deployment and management processes using GitOps workflows as well as CI/CD pipelines.

Essential Knowledge, Skills, And Experience

  • Experienced administering, maintaining and troubleshooting a Linux environment
  • Competent in automation and bash scripting
  • Highly customer focused; able to explain IT technical concepts in a manner which non-IT experts can understand
  • Hands-on experience working in a DevOps team and using agile methodologies

Plus Some Of The Following Areas Of Expertise

  • Hands-on knowledge of a range of scientific and HPC applications such as simulation software, bioinformatics tools or 3D data visualization packages
  • Experience administering and optimizing SLURM
  • Experience deploying and administering OpenStack
  • Experience with configuration automation and infrastructure as code (e.g. Ansible, Hashicorp Terraform, AWS CloudFormation, Amazon Cloud Developer Kit)
  • Experience deploying infrastructure and code to public cloud, especially AWS
  • Experience with software distribution frameworks such as Easybuild or Spack
  • Familiarity with container runtimes such as Docker, Singularity or enroot
  • Experience with frameworks for regression tests and benchmarks for HPC applications, like Reframe HPC


Job Details

Role Level: Not Applicable Work Type: Full-Time
Country: India City: Chennai ,Tamil Nadu
Company Website: http://www.virtusa.com Job Function: DevOps & QA
Company Industry/
Sector:
IT Services and IT Consulting

What We Offer


About the Company

Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.

Report

Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@talentmate.com.


Recent Jobs
View More Jobs
Talentmate Instagram Talentmate Facebook Talentmate YouTube Talentmate LinkedIn