Full-time

Principal Software Engineer - AI Inference

Posted by NVIDIA • Santa Clara, CA, United States

📍 Santa Clara, CA 🕒 February 23, 2026

Apply for this Job Similar Jobs

About the Role

                    NVIDIA is the platform for every new AI-powered application. We seek a Principal Software Engineer - AI Inference to advance open-source LLM serving. This role involves contributing to upstream inference engines like vLLM and SGLang. You will ensure they run outstandingly on NVIDIA GPUs and systems. You will also strengthen the underlying stack for high-throughput, low-latency inference at scale.
  
This is a hands-on, deeply technical role for someone who excels at the intersection of inference runtime architecture, GPU performance engineering, and distributed systems. You will collaborate closely with internal model teams, infrastructure/SRE, and product to ensure NVIDIA platforms are outstanding members in the broader inference ecosystem. You will also deliver production-grade improvements that benefit both NVIDIA and the community.
  
What you'll be doing:
+ Drive upstream-first engineering in vLLM/SGLang: author and land PRs or equivalent experience, eng...

Job Details

Location Santa Clara, CA
Job Type Full-time
Category other-general
Posted February 23, 2026
Deadline March 04, 2026

Ready to Apply?

Submit your application today and take the next step in your career journey with NVIDIA.

Apply Now