Full time

Senior Software Engineer, Server Manageability FMEA

Posted by NVIDIA • remote, us, remote, us, United States

📍 remote, us, remote, us 🕒 February 19, 2026

About the Role

NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern deep learning — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company.” We're looking to grow our company and establish teams with the most thoughtful people in the world.

NVIDIA DGX, HGX, and MGX systems deliver the world's leading solutions for enterprise AI infrastructure at scale.

We are looking for a talented and experienced engineer having experience with RAS(Reliability, Availability, and Serviceability) and fault mode analysis (FMEA). You will be responsible for improving reliability of NVIDIA GPU and Grace systems by doing failure analysis for whole system and architecting software and firmware to be...

Ready to Apply?

Submit your application today and take the next step in your career journey with NVIDIA.

Apply Now