Job Description

Your Role

  • Run managed services, not just systems. Operate multi-tenant data/AI platforms (Spark, Airflow, Flink, Jupyter) with clear SLAs/SLIs/SLOs, cost guardrails, and capacity plans across AWS/GCP + Kubernetes.
  • Be the face of reliability. Lead incidents end-to-end, own customer comms and post-incident reviews (RCA with actions customers can see and feel).
  • Design for Customer experience. Help Data scientists and customers reduce failed/slow jobs, improve time-to-data, and optimize costs—so customers notice faster pipelines and fewer surprises.
  • Standardize & scale. Build service runbooks, golden paths, and automation that make onboarding and daily ops predictable across customers.
  • Automate the toil away. Ship tooling (Bash/Python, GitOps, CI/CD) for backups, DR drills, upgrades, access, and environment bootstrapping.
  • Make signals meaningful. Instrument platforms with metrics/logs/traces; tune alerting to cut noise and improve detection and response times
  • Govern change. Plan and execute upgrades/migrations within change windows; champion safe deploys and rollback strategies.
  • Partner & mentor. Guide junior engineers; collaborate with customer dev/data teams to unblock delivery and raise the reliability bar.
  • Participate in on-call. Join a 24x7 rotation with crisp handoffs and playbooks.


Your Qualifications

  • Hands-on support for ETL/ELT, SQL, and production pipelines/workflows.
  • Strong experience in at least one of Spark, Airflow, Flink, or Jupyter (plus the ecosystem around it).
  • Solid working knowledge in at least one (1) language - Python, Java or Scala (Automations, Data Manipulations & Orchestrations)
  • Real-world AWS or GCP and production environment usage as a User or Administrator
  • Kubernetes (or Docker) for scheduling/scale.
  • Incident management, post-incident reviews, change management, and service reporting.
  • Clear customer-facing comms (status updates, RCAs, runbooks).
  • 5+ years across the domains above, with depth in at least 1–2 tools per domain.


Plus points if you have:

  • Certifications: CKA/CKAD, AWS (Associate/Professional), or equivalent.
  • IaC & DevOps: Terraform, Helm, Argo CD/GitOps, CI/CD for data platforms.
  • Observability & ITSM: Prometheus/Grafana/Datadog; Jira Service Management/ServiceNow, StatusPage.
  • Security & compliance basics (least-privilege access, audit trails)


Job Details

Role Level: Mid-Level Work Type: Full-Time
Country: Philippines City: Mandaluyong National Capital Region
Company Website: https://www.opswerks.com Job Function: Engineering
Company Industry/
Sector:
Technology Information and Media

What We Offer


About the Company

Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.

Report

Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@talentmate.com.


Recent Jobs
View More Jobs
Talentmate Instagram Talentmate Facebook Talentmate YouTube Talentmate LinkedIn