Full-time

Site Reliability Engineer

Posted by Brace Infotech Private Ltd • Hyderabad, Telangana, India

📍 Hyderabad, Telangana 🕒 March 02, 2026

About the Role

Skills Required:

SQL, NOSQL, Nagios, Cloudwatch, Zabbix, Datadog, New Relic, Prometheus, Grafana,

App Dynamics, Site24x7, Telemetry, Splunk, CI CD, CI/CD, CICD, DevOps, Kentico,

SRE, Site Reliability, AIOps, Agentic, GEN AI, AI, ML


Experience Range:

10 - 16 years


Key Responsibilities:

• Design, develop and maintain observability, monitoring, and alerting systems for AI

platforms and mission-critical backend services.

• Design telemetry pipelines, logging infrastructure, and metrics dashboards using tools

such as Splunk, Prometheus, Grafana, and OpenTelemetry.

• Define and maintain SLOs, SLIs, and real-time health indicators across platform

services and APIs.

• Participate in on-call rotations and lead the resolution of high-impact incidents,

including root cause analysis and postmortem reporting.

• Collaborate with platform engineering teams to enforce governance, complian...

Ready to Apply?

Submit your application today and take the next step in your career journey with Brace Infotech Private Ltd.

Apply Now