Full-time

LLM Inference Frameworks and Optimization Engineer

Posted by Together AI • Singapore, Singapore, Singapore

📍 Singapore, Singapore 🕒 March 04, 2026

Apply for this Job Similar Jobs

About the Role

About the Role 
At , we are building state-of-the-art infrastructure to enable efficient and scalable inference for large language models (LLMs). Our mission is to optimize inference frameworks, algorithms, and infrastructure, pushing the boundaries of performance, scalability, and cost-efficiency. 
We are seeking an Inference Frameworks and Optimization Engineer to design, develop, and optimize distributed inference engines that support multimodal and language models at scale. This role will focus on low-latency, high-throughput inference, GPU/accelerator optimizations, and software-hardware co-design, ensuring efficient large-scale deployment of LLMs and vision models. 
This role offers a unique opportunity to shape the future of LLM inference infrastructure, ensuring scalable, high-performance AI deployment across a diverse range of applications. If you're passionate about pushing the boundaries of AI inference, we'd love to hear from you 
Responsib...
                

Job Details

Location Singapore, Singapore
Job Type Full-time
Category Other-General
Posted March 04, 2026
Deadline April 13, 2026

Ready to Apply?

Submit your application today and take the next step in your career journey with Together AI.

Apply Now