Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Learn how to systematically optimize Large Language Model (LLM) judges for evaluating model outputs at dramatically reduced costs through this 42-minute AutoML Seminars presentation. Discover the challenges of expensive human annotations in LLM evaluation and explore how LLM-based judges can rank models without human intervention by comparing outputs between different LLMs. Examine the confounding factors that make fair comparisons difficult across different research papers, including variations in models, prompts, and hyperparameters that are often changed simultaneously. Master a systematic approach to analyzing and tuning LLM judge hyperparameters using multi-objective multi-fidelity optimization techniques that balance accuracy against computational cost while significantly reducing search expenses. Understand how this methodology identifies judges that outperform existing benchmarks in both accuracy and cost-efficiency while utilizing open-weight models for enhanced accessibility and reproducibility. Access the accompanying research paper and implementation code to apply these cost-effective evaluation strategies in your own LLM projects and research.
Syllabus
Tuning LLM Judge Design Decisions for 1/1000 of the Cost
Taught by
AutoML Seminars