Full-time

Deep Learning Kernel Software Performance Architect

Posted by NVIDIA • Shanghai, China, China

📍 Shanghai, China 🕒 February 26, 2026

About the Role

NVIDIA is seeking Software Performance Architects to optimize GPU kernel performance for state-of-the-art data-center platforms. We build automated, data-driven workflows to detect, explain, and prevent performance regressions across key deep learning workloads, partnering closely with kernel developers, compiler teams, infrastructure, and architecture/performance groups.


What you'll be doing:
+ Performance analysis + debugging
+ Validate and analyze performance of GPU-accelerated kernels and key deep learning building blocks.
+ Debug performance issues end-to-end: reproduce, isolate root causes, propose fixes or mitigation paths, and drive closure with the owning teams.
+ Build performance narratives using structured evidence: baselines, controlled comparisons, and regression attribution.
+ Automation + regression infrastructure (Python-heavy)
+ Develop and maintain Python-based automation for performance testing and analysis—using modern AI-assisted ...

Ready to Apply?

Submit your application today and take the next step in your career journey with NVIDIA.

Apply Now