Site Reliability Engineer
Posted by Finexus Sdn Bhd • Kuala Lumpur, Kuala Lumpur, Malaysia
About the Role
Responsibilities
Ensure high availability and reliability of IT systems, applications, and PCI DSS-certified data centres, supporting both internal operations and client-facing platforms.
Perform system administration (Linux and Windows servers), including installation, configuration, patching, monitoring, and performance tuning.
Manage data storage, backup, and disaster recovery (DRP) to ensure data integrity, resilience, and compliance with industry standards.
Conduct capacity planning and lifecycle management of infrastructure resources, ensuring optimal performance and scalability.
Define and monitor Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets to measure and improve reliability.
Implement chaos testing and fault-injection practices to proactively identify weaknesses and improve system resilience.
Optimize observability and alerting systems (e.g., Prometheus, Grafana, ELK, Nagios ...
Ready to Apply?
Submit your application today and take the next step in your career journey with Finexus Sdn Bhd.
Apply Now