Job Description

Department

Digital & Technology Office

Employee Type

Probationary

At Cebu Pacific , we embrace challenges head-on, staying at the forefront of innovation through agile technology, data-driven insights, and a relentless focus on customer and operational excellence. A career with our Digital Team offers you the opportunity to shape the future of air travel—whether by developing cutting-edge digital products, harnessing the power of analytics, or driving impactful projects that enhance passenger experiences and streamline operations.

We don’t just build technology—we create experiences and innovations that inspire and redefine the future of travel. Be part of this exciting journey and let your expertise take flight as a moment maker in the ever evolving field of Digital as a  Senior Site Reliability En gineer . Visit our careers site to learn more about how your moment matters at Cebu Pacific: Cebu Pacific Careers Site

The Senior Site Reliability Engineer will serve as the first line of defense for our 24/7 operations. You will act as the guardian of our production environment, utilizing Dynatrace to maintain a holistic view of both Infrastructure and Application health.

You will not just monitor uptime; you will actively test system resilience, manage major incidents, and facilitate stability reporting. You will be the primary notification point for all P1/P2 incidents, responsible for deep-dive triage, quick remediation, and coordinating Major Incident Management (MIM).

Primary Responsibilities:

24/7 Incident Command & Alerting

  • 24/7 Availability: Participate in a shift rotation or on-call schedule to ensure continuous coverage. You are the "eyes on glass" for the organization.
  • Unified Alerting: Manage the notification workflow. Ensure that Critical Alerts for both Infrastructure failures and Application failures trigger immediate notifications to the 24/7 team.
  • Major Incident Management (MIM): Lead the technical response during critical outages. Coordinate cross-functional teams to restore service rapidly.

Observability Strategy (Dynatrace Focus)

  • Dynatrace Administration: Act as the Subject Matter Expert (SME) for our Dynatrace implementation.
  • Configure Management Zones, Alerting Profiles, and Dashboards to provide a "Single Pane of Glass."
  • Utilize Dynatrace PurePath for distributed tracing to identify bottlenecks in microservices.
  • Leverage Davis AI to automatically detect anomalies and reduce alert noise.
  • Comprehensive Monitoring Scope:
  • Network Health: Monitor VPN Tunnel status, Load Balancer (ALB/NLB) health, and DNS latency. Trigger: Alert on packet loss or high latency.
  • Infrastructure Health: Monitor Disk/Volume usage, CPU/Memory saturation, and SSL Certificate expiry.
  • Security: Monitor for DDoS attack patterns and WAF spikes.

Resilience & Chaos Engineering

  • Chaos Engineering: Plan and execute Chaos Engineering exercises (e.g., simulating pod failures, network latency, zone outages) to test the systems resilience and verify that failover mechanisms work as expected.
  • Reliability Recommendations: Proactively analyze trends and provide architectural recommendations to development and infrastructure teams to improve system stability.
  • First Line Troubleshooting: Serve as the L1/L2 troubleshooter for Kubernetes (EKS), AWS, and Linux issues. Execute "Quick Fix" runbooks to mitigate impact before escalating to platform engineering.

Application Triage & Analysis

  • Deep-Dive Triage: Go beyond "system check" to perform deep analysis using Dynatrace. Analyze stack traces and exception logs to pinpoint the exact line of code causing the failure.
  • Root Cause Differentiation: Rapidly differentiate between an Infrastructure Issue (e.g., Network timeout) vs. an Application Logic Error (e.g., NullPointer caused by bad data).
  • Blameless RCA: Facilitate Root Cause Analysis sessions to ensure permanent fixes are applied to recurring problems.

Governance & Reporting (Stability Cadence)

  • Stability Calls: Facilitate and lead the Weekly/Bi-Weekly Stability Call. Present the health status of all technical towers to leadership and stakeholders.
  • Reporting: Generate regular reports on system uptime, error budgets, incident trends, and MTTR (Mean Time To Recovery).
  • Cross-Tower Visibility: Ensure that the dashboards and reports provide value to all teams (Network, App, Cloud), ensuring no siloed "blind spots" in production.

Automation & Toil Reduction

  • Remediation Scripting: Develop scripts (Python/Bash) to "Auto-Heal" common issues (e.g., clearing logs when disk is full, restarting stuck services).
  • Process Improvement: Identify manual checks and convert them into automated Dynatrace alerts or synthetic tests.

Qualifications:

  • Shift Availability: Must be willing to work in a 24/7 shift environment or strictly defined on-call rotation.
  • Dynatrace Expertise: Deep experience administering and using Dynatrace in a production environment (Dashboards, OneAgent, PurePaths).
  • Troubleshooting Expertise:
  • Network: Understanding of DNS, TCP/IP, Load Balancing, and Firewalls.
  • Compute/Storage: Understanding of block vs. object storage, CPU stealing, and memory management.
  • Governance: Experience facilitating technical management calls and producing executive-level reliability reports.
  • Application Debugging: Ability to read application logs (Java, Node, Python) to understand why a service failed.
  • Cloud (AWS) & K8s: Solid understanding of EKS, EC2, and other AWS Services

Why Join Us:

  • We are the first Great Place to Work ® certified airline in Southeast Asia.
  • We have been recognized as Best Employer Brand on LinkedIn for two consecutive years.
  • Be part of a forward-thinking team that values innovation and continuous improvement.
  • Play a key role in developing and nurturing the talents that drive our success.
  • Accelerate your career with access to extensive learning programs and leadership development initiatives, all under Ceb U, our corporate university.
  • Enjoy unique employee perks such as free travel for you and your family. Expanded coverage to common law partners and same sex partners!
  • Be assured of a comprehensive healthcare coverage upon hire.

Note: This position is for an Individual Contributor and will be based in Pasay City, Metro Manila but currently follows a hybrid workplace flexibility arrangement.

Your moment matters. Be a Moment Maker!

Cebu Pacific warns the public against fake hiring and training advertisements by unknown groups. We do not require payment from candidates during the recruitment process nor do we require submission of physical application documents. For official information on our job openings, please visit our LinkedIn or career site at Cebu Pacific Careers Site for reference .

Experience Range Range (Years)

4 - 8 years

Job posted on

2026-03-12


Job Details

Role Level: Mid-Level Work Type: Full-Time
Country: Philippines City: Pasay National Capital Region
Company Website: https://www.cebupacificair.com/en-PH/pages/about/careers Job Function: DevOps & QA
Company Industry/
Sector:
Airlines and Aviation

What We Offer


About the Company

Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.

Report

Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@talentmate.com.


Recent Jobs
View More Jobs
Talentmate Instagram Talentmate Facebook Talentmate YouTube Talentmate LinkedIn