Full-time

Site Reliability Engineer

Posted by E-IT • ottawa, on, Canada

📍 ottawa, on 🕒 June 04, 2026

About the Role

Job Description

Key Responsibilities:

  • Incident Management and Reliability: Lead the incident management process, ensuring high availability and performance of the applications. Develop and implement SRE practices to improve system reliability and resilience.
  • Monitoring and Observability: Utilize Dynatrace, Splunk, and Grafana to monitor system health, detect anomalies, and provide actionable insights for performance optimization.
  • Root Cause Analysis: Conduct thorough root cause analysis of incidents and outages, developing long-term solutions to prevent recurrence.
  • DevOps Practices: Collaborate with development and operations teams to streamline CI/CD pipelines, automate workflows, and implement infrastructure as code (IaC) for efficient service deployment and management.
  • Networking Expertise: Provide expertise in networking technologies (Cisco, Arista, AVI, etc.), ensuring...

Ready to Apply?

Submit your application today and take the next step in your career journey with E-IT.

Apply Now