Job Description

The Site Reliability Engineer (SRE) role combines software engineering and systems engineering to build and run large-scale, fault-tolerant systems. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of the systems and services that support our business operations. You will work closely with software engineers to improve the design and architecture of systems, as well as automate and enhance the system infrastructure and operations processes. The aim is to create efficient, reliable, and scalable solutions by leveraging both your software and systems expertise. If you thrive in a fast-paced environment, enjoy solving complex systems problems, and have a passion for automation and operational excellence, this role is for you.


Responsibilities

  • Design, develop, and implement software solutions to improve system reliability and availability.
  • Collaborate with development teams to ensure applications are efficiently supported and scalable.
  • Monitor and maintain system performance through troubleshooting, debugging, and optimizing.
  • Create and maintain automation tooling for system management and deployment processes.
  • Implement monitoring solutions to detect and resolve system issues proactively.
  • Lead incident response and postmortem meetings to minimize downtime and improve services.
  • Develop and document operational best practices and share knowledge with the team.
  • Configure network settings to enhance performance and ensure high availability.
  • Conduct regular system and security audits to ensure compliance with industry standards.
  • Contribute to the development and enhancement of disaster recovery plans and exercises.
  • Collaborate with cross-functional teams to manage and optimize service-level agreements.
  • Provide technical guidance and mentorship to junior team members and new hires.

Requirements

  • Bachelor’s degree in Computer Science, Information Technology, or related field.
  • Minimum of 5 years of experience in site reliability or systems engineering role.
  • Proficiency in scripting and programming languages such as Python, Go, or Java.
  • Experience with cloud platforms like AWS, Google Cloud, or Microsoft Azure.
  • Strong understanding of networking concepts and system architecture principles.
  • Knowledge of configuration management tools like Ansible, Puppet, or Chef.
  • Excellent problem-solving skills and the ability to work under pressure.
  • Ability to communicate complex technical concepts to non-technical stakeholders.


Job Details

Role Level: Mid-Level Work Type: Full-Time
Country: Philippines City: Manila, National Capital Region
Company Website: https://www.talentmate.com Job Function: DevOps & QA
Company Industry/
Sector:
Recruitment & Staffing

What We Offer


About the Company

Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.

Report

Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@talentmate.com.


Recent Jobs
View More Jobs
Talentmate Instagram Talentmate Facebook Talentmate YouTube Talentmate LinkedIn