Full-time

GenAI Evaluation Engineer

Posted by Diverse Lynx • Bellevue, Genf, Switzerland

📍 Bellevue, Genf 🕒 March 01, 2026

Apply for this Job Similar Jobs

About the Role

Job Description Strong understanding of LLMs and generative AI concepts, including model behavior and output evaluation 
Experience with AI evaluation and benchmarking methodologies, including baseline creation and model comparison 
Hands-on expertise in Eval testing, creating structured test suites to measure accuracy, relevance, safety, and performance 
Ability to define and apply evaluation metrics (precisionrecall, BLEUROUGE, F1, hallucination rate, latency, cost per output) Prompt engineering and prompt testing experience across zero-shot, few-shot, and system prompt scenarios 
Python other programming languages, for automation, data analysis, batch evaluation execution, and API integration 
Experience with evaluation tools/frameworks (OpenAI Evals, HuggingFace evals, Promptfoo, Ragas, DeepEval, LM Eval Harness) 
Ability to create datasets, test cases, benchmarks, and ground truth references for consistent...
                

Job Details

Location Bellevue, Genf
Job Type Full-time
Category Other-General
Posted March 01, 2026
Deadline April 10, 2026

Ready to Apply?

Submit your application today and take the next step in your career journey with Diverse Lynx.

Apply Now