Full-time

Remote AI Model Inference Engineer — Vulkan & On-Device GPU

Posted by Tether.io • Switzerland, Switzerland, Switzerland

📍 Switzerland, Switzerland 🕒 March 01, 2026

Apply for this Job Similar Jobs

About the Role

Overview Senior AI Research Engineer, Model Inference (Remote) - Tether.io 
About the job We are looking for an experienced AI Model Engineer with deep expertise in kernel development, model optimization, fine-tuning, and GPU acceleration. The engineer will extend the inference framework to support inference and fine-tuning for Language models with a strong focus on mobile and integrated GPU acceleration (Vulkan). 
This role requires hands-on experience with quantization techniques, LoRA architectures, Vulkan backend, and mobile GPU debugging. You will play a critical role in pushing the boundaries of desktop and on-device inference and fine-tuning performance for next-generation SLM/LLMs. 
Responsibilities Implement and optimize custom inference and fine-tuning kernels for small and large language models across multiple hardware backends. 
Implement and optimize full and LoRA fine-tuning for small and large languag...
                

Job Details

Location Switzerland, Switzerland
Job Type Full-time
Category Other-General
Posted March 01, 2026
Deadline April 10, 2026

Ready to Apply?

Submit your application today and take the next step in your career journey with Tether.io.

Apply Now