About the Role
Descripción del trabajo Job Description
We are seeking a hands‑on Site Reliability Engineer (SRE) / AI Platform DevOps Engineer to own infrastructure provisioning, CI/CD automation, telemetry pipelines, and production deployment for AI‑powered services, agents, and orchestration systems.
This is an SRE‑heavy, infrastructure‑first role , focused on ensuring AI systems operating in production are:
- Reliable
- Observable
- Scalable
- Secure
- Cost‑efficient
- Safe to deploy and operate
You will play a critical role in building and maintaining the platform foundation that enables AI services to run safely and efficiently at scale.
Key Responsibilities
1. Infrastructure Provisioning & Automation
- Design and manage cloud infrastructure using Infrastructure as Code (Terraform or similar)
- Provision and maintain Kubernetes clu...
Ready to Apply?
Submit your application today and take the next step in your career journey with Endava.
Apply Now