Job Description
Technical Expertise Required
Basic knowledge of 4 technical expertise areas, with a strong interest in 1 area
- Strong knowledge of Linux/Unix systems and command line tools.
- Proficiency in scripting languages such as Python, Shell, or Perl.
- Experience with configuration management tools like Ansible, Puppet, or Chef.
- Familiarity with cloud platforms like AWS, Azure, or Google Cloud.
- Understanding of networking principles and protocols (TCP/IP, HTTP, DNS, etc.).
- Knowledge of containerization technologies (Docker, Kubernetes) and orchestration tools.
- Expertise in monitoring and logging tools such as Prometheus, Grafana, ELK stack, or Splunk. (Optional - But Good to Know)
- Strong problem-solving and troubleshooting skills, with the ability to analyze and resolve complex technical issues.
- Excellent communication and collaboration skills to work effectively with cross-functional teams.
- Strong attention to detail and ability to work in a fast-paced, dynamic environment.
- Terraform basic syntax and GitLab CI/CD configuration, pipelines, jobs
- Cloud resources provisioning and configuration through CLI/API
- Understanding of how to do basic queries in logs tools for general questions
- Operating system (Linux) configuration, package management, startup and troubleshooting
- Block and object storage configuration
- Networking VPCs, proxies and CDNs
Responsibilities
- Design and implement highly available and scalable systems, ensuring the reliability and performance of the companys website or application.
- Collaborate with cross-functional teams to define and establish service level objectives (SLOs) and service level agreements (SLAs) for critical systems.
- Monitor systems and applications, proactively identifying and resolving any performance bottlenecks or availability issues.
- Develop and maintain monitoring tools, alerts, and dashboards to provide visibility into system health and performance.
- Conduct post-incident analyses to identify root causes and implement preventive measures to avoid future incidents.
- Automate repetitive tasks and processes to improve efficiency and reduce manual intervention.
- Create and maintain documentation for system architecture, configuration, and troubleshooting procedures.
- Perform capacity planning and resource allocation to ensure optimal system performance and scalability.
- Collaborate with development teams to implement and deploy new features and enhancements, ensuring they meet reliability and performance standards.
- Stay up to date with industry best practices, new technologies, and emerging trends in site reliability engineering.
Execution
Follow established processes and runbooks, and submit updates to improve them for others.
Proposes ideas and solutions within the Infrastructure Department to reduce the workload through automation.
Plan and execute configuration change operations both at the application and the infrastructure levels.
Actively looks for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation
Objectives of this role
- Run the production environment by monitoring availability and taking a holistic view of system health
- Build software and systems to manage platform infrastructure and applications
- Improve reliability, quality, and time-to-market of our suite of software solutions
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
- Provide primary operational support and engineering for multiple large-scale distributed software applications
Required Skills And Qualifications
- Bachelors degree in computer science, engineering, or a related field.
- Proven experience as a Site Reliability Engineer or a similar role.
- Solid understanding of software development methodologies and DevOps principles.
- Experience with agile and iterative development processes.
- Certification in relevant technologies or frameworks is a plus (e.g., AWS Certified DevOps Engineer, Certified Kubernetes Administrator).
- Familiarity with continuous integration/continuous deployment (CI/CD) pipelines.
- Experience with source control systems such as Git or SVN.
- Knowledge of security best practices and experience implementing security measures in a production environment.
- Ability to work independently and handle multiple projects and priorities simultaneously.
- Strong analytical and problem-solving skills, with a focus on continuous improvement and automation.
Preferred Skills And Qualifications
- Previous success in technical engineering
- Coding experience beyond simple scripts
Responsibilities
QUALIFICATIONS
About Us
At Zensar, we’re
“experience-led everything”. We are committed to conceptualizing, designing, engineering, marketing, and managing digital solutions and experiences for over 130 leading enterprises. We are a company driven by a bold purpose:
Together, we shape experiences for better futures. Whether for our clients, our people, or the world around us, this belief powers everything we do. At the heart of our culture is
ONE with Client - a set of four core values that reflect who we are and how we work:
One Zensar, Nurturing, Empowering, and Client Focus.
Part of the $4.8 billion RPG Group, we’re a community of 10,000+ innovators across 30+ global locations, including Milpitas, Seattle, Princeton, Cape Town, London, Zurich, Singapore, and Mexico City. Explore Life at Zensar and join us to Grow. Own. Achieve. Learn. to be the best version of yourself.
We believe the best work happens when individuality is celebrated, growth is encouraged, and well-being is prioritized. We are an equal employment opportunity (EEO) and affirmative action employer, committed to creating an inclusive workplace. All qualified applicants will be considered without regard to race, creed, color, ancestry, religion, sex, national origin, citizenship, age, sexual orientation, gender identity, disability, marital status, family medical leave status, or protected veteran status.