Finance Certifications Goldman Sachs & Amazon Teams Trust
Lead AI Strategy with UCSB's Agentic AI Program — Microsoft Certified
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to systematically optimize Large Language Model (LLM) judges for evaluating model outputs at dramatically reduced costs through this 42-minute AutoML Seminars presentation. Discover the challenges of expensive human annotations in LLM evaluation and explore how LLM-based judges can rank models without human intervention by comparing outputs between different LLMs. Examine the confounding factors that make fair comparisons difficult across different research papers, including variations in models, prompts, and hyperparameters that are often changed simultaneously. Master a systematic approach to analyzing and tuning LLM judge hyperparameters using multi-objective multi-fidelity optimization techniques that balance accuracy against computational cost while significantly reducing search expenses. Understand how this methodology identifies judges that outperform existing benchmarks in both accuracy and cost-efficiency while utilizing open-weight models for enhanced accessibility and reproducibility. Access the accompanying research paper and implementation code to apply these cost-effective evaluation strategies in your own LLM projects and research.
Syllabus
Tuning LLM Judge Design Decisions for 1/1000 of the Cost
Taught by
AutoML Seminars