Full-time

Software Development Engineer AI/ML, Inference Serving, AWS Neuron

Posted by Amazon • Cupertino, CA, United States

📍 Cupertino, CA 🕒 March 02, 2026

Apply for this Job Similar Jobs

About the Role

                    Description
AWS Neuron is the software stack powering AWS Inferentia and Trainium machine learning accelerators, designed to deliver high-performance, low-cost inference at scale. The Neuron Serving team develops infrastructure to serve modern machine learning models—including large language models (LLMs) and multimodal workloads—reliably and efficiently on AWS silicon. We are seeking a Software Development Engineer to lead and architect our next-generation model serving infrastructure, with a particular focus on large-scale generative AI applications.
  
Key job responsibilities
* Architect and lead the design of distributed ML serving systems optimized for generative AI workloads
* Drive technical excellence in performance optimization and system reliability across the Neuron ecosystem
* Design and implement scalable solutions for both offline and online inference workloads
* Lead integration efforts with frameworks such as vLLM, SGLang, Torch XLA, TensorRT, and ...
                

Job Details

Location Cupertino, CA
Job Type Full-time
Category other-general
Posted March 02, 2026
Deadline March 07, 2026

Ready to Apply?

Submit your application today and take the next step in your career journey with Amazon.

Apply Now