Full-time

Lead Site Reliability Engineer - Remote

Posted by Epam • remote, us, remote, us, United States

📍 remote, us, remote, us 🕒 June 10, 2026

About the Role

Description



We are looking for a candidate to join a multi-functional SRE team with the focus on Google Cloud Platform. You should have cloud engineering experience in such areas acting as the SME on operation automation and monitoring, identifying TOIL within the team's existing systems and processes, and recommending, and implementing automated solutions to reduce TOIL and improve the efficiency and effectiveness of the team.

Req.#

Requirements

  • Good knowledge of GCP
  • Hands-on in defining and creating of CUJ, SLO, SLI, and Error Budgeting based on NFR
  • Strong Knowledge of IAAC Terraform, GitHub, Docker Images
  • Strong Scripting like Bash, PowerShell, Python, Ansible
  • Good knowledge of containers like Kubernetes
  • Design and implementation of automated workflows
  • Experience in reducing TOIL in an SDLC or IT operations environment
  • Underst...
  • Ready to Apply?

    Submit your application today and take the next step in your career journey with Epam.

    Apply Now