Job Description

As a Site Reliability Engineer (SRE) 14N25, you will be integral in transforming and maintaining reliable systems while working across diverse engineering, operations, and support teams. Your primary focus will be ensuring the uptime, performance, and resilience of crucial online platforms and services. By employing both software engineering and systems engineering approaches, you will not only troubleshoot and resolve tactical incidents but also proactively avoid them by building automated tools and systems. Your role will involve collaborating closely with developers to foster a culture of software delivery that values velocity and stability equally. If you are keen on diving deep into data analysis, performance bottleneck identification, and system architecture, this opportunity will provide a challenging yet fulfilling professional journey.


Responsibilities

  • Monitor the performance and reliability of active systems and applications effectively.
  • Develop and maintain operations solutions to increase system reliability and performance.
  • Collaborate closely with development and engineering teams to ensure seamless deployment processes.
  • Employ incident management to troubleshoot and resolve production issues swiftly.
  • Create and refine diagnostic and monitoring tools to detect system anomalies.
  • Integrate automation technologies to reduce manual intervention and human errors.
  • Participate in on-call rotations to provide timely emergency support and escalation.
  • Document system designs, software implementations, and operational processes comprehensively.
  • Conduct post-incident reviews to understand root causes and propose long-term solutions.
  • Evaluate the current technology stack and recommend improvements for scaling purposes.
  • Ensure compliance with security policies and protect data integrity across the platforms.
  • Contribute to a culture of continuous improvement and operational excellence.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related technical field.
  • Minimum of 3 years of experience in site reliability or similar engineering roles.
  • Proficiency in one or more programming languages, such as Python, Java, or Go.
  • Strong understanding of cloud technologies and container orchestration tools like Kubernetes.
  • Experience with monitoring and observability tools, such as Prometheus or Grafana.
  • Excellent problem-solving skills and a methodical approach to systems management.
  • Strong communication skills to effectively collaborate with cross-functional teams.


Job Details

Role Level: Mid-Level Work Type: Full-Time
Country: Philippines City: Manila, National Capital Region
Company Website: https://www.talentmate.com Job Function: DevOps & QA
Company Industry/
Sector:
Recruitment & Staffing

What We Offer


About the Company

Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.

Report

Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@talentmate.com.


Recent Jobs
View More Jobs
Talentmate Instagram Talentmate Facebook Talentmate YouTube Talentmate LinkedIn