Full-time

LLMInference Performance&EvalsEngineer

Posted by Cerebras Systems • toronto, on, Canada

📍 toronto, on 🕒 May 28, 2026

About the Role

About The Role

Join the inference model team dedicated to bring up the state-of-the-art models, numerically validating and accelerating new model ideas on wafer-scale hardware. You will prototype architectural tweaks, build performance-eval pipelines, and turn hard numbers into changes that land in production. Key Responsibilities

Prototype and benchmark cutting-edge ideas: new attentions, MoE, speculative decoding, and many more innovations as they emerge. Develop agent-driven automation that designs experiments, schedules runs, triages regressions, and drafts pull-requests. Work closely with compiler, runtime, and silicon teams: unique opportunity to experience the full stack of software / hardware innovation. Keep pace with the latest open- and closed-source models; run them first on wafer scale to expose new optimization opportunities. Skills And Qualifications

3 + years building high-performance ML or systems software. Solid grounding in Transformer math...

Ready to Apply?

Submit your application today and take the next step in your career journey with Cerebras Systems.

Apply Now