Full-time

Senior Data Platform Reliability Engineer

Posted by OpsWerks • Cebu City, Gitnang Kabisayaan, Philippines

📍 Cebu City, Gitnang Kabisayaan 🕒 March 04, 2026

About the Role

Your Role
Run managed services, not just systems. Operate multi-tenant data/AI platforms (Spark, Airflow, Flink, Jupyter) with clear SLAs/SLIs/SLOs, cost guardrails, and capacity plans across AWS/GCP + Kubernetes.
Be the face of reliability. Lead incidents end-to-end, own customer comms and post-incident reviews (RCA with actions customers can see and feel).
Design for Customer experience. Help Data scientists and customers reduce failed/slow jobs, improve time-to-data, and optimize costs—so customers notice faster pipelines and fewer surprises.
Standardize & scale. Build service runbooks, golden paths, and automation that make onboarding and daily ops predictable across customers.
Automate the toil away. Ship tooling (Bash/Python, GitOps, CI/CD) for backups, DR drills, upgrades, access, and environment bootstrapping.
Make signals meaningful. Instrument platforms with metrics/logs/traces; tune alerting to cut noise and improve detection and response times
Govern ch...

Ready to Apply?

Submit your application today and take the next step in your career journey with OpsWerks.

Apply Now