Full-time
Senior Data Platform Reliability Engineer
Posted by NISPI • Mandaluyong, Metro Manila, Philippines
About the Role
Key ResponsibilitiesOperate & Scale Managed Services
- Run multi-tenant data and AI platforms (Spark, Airflow, Flink, Jupyter) with clearly defined SLAs, SLIs, and SLOs
- Own capacity planning, cost optimization, and usage guardrails across AWS/GCP and Kubernetes
- Ensure predictable, reliable operations across multiple customers and environments
Reliability & Incident Leadership
- Be the face of reliability, leading incidents end-to-end
- Own customer communications, status updates, and post-incident reviews (RCAs) with clear, visible action items
- Drive continuous improvement based on incident learnings
Customer Experience & Enablement
- Partner with data scientists and customer teams to reduce failed or slow jobs
- Improve time-to-data, pipeline reliability, and overall platform performance
- Optimize platform usage and costs so customers experience faster pipelines and fewer surp...
Ready to Apply?
Submit your application today and take the next step in your career journey with NISPI.
Apply Now