Google, IBM & Microsoft Certificates — All in One Plan
MIT Sloan: Lead AI Adoption Across Your Organization — Not Just Pilot It
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to scale up the evaluation of AI applications through automated evaluation techniques in this comprehensive tutorial. Explore the challenges of evaluating open-ended LLM tasks that typically require human assessment and discover practical solutions using automated evals. Master the typical LLM workflow and understand common problems that arise when building AI applications. Dive deep into two distinct types of automated evaluations and their applications in real-world scenarios. Follow along with a detailed case study featuring an eval-driven LinkedIn Ghostwriter project that demonstrates the complete process from identifying failure modes to creating LLM judges. Gain hands-on experience with curating user inputs, generating content, applying evaluations, and refining results based on feedback. Access example code and references to implement these techniques in your own AI projects, and see a live demonstration of the automated evaluation system in action.
Syllabus
Introduction - 0:00
The Typical LLM Workflow - 0:21
The Problem - 1:11
Automed Evals - 1:50
2 Types of Automated Evals - 4:25
Example: Eval-driven LinkedIn Ghostwriter - 7:03
Step 1: Identify Failure Modes - 9:36
Step 2: Create LLM Judge - 10:49
Step 3: Curate User Inputs - 19:49
Step 4: Generate LI Posts - 20:30
Step 5: Apply Evals - 21:12
Step 6: Review Results and Refine - 22:06
The Results - 25:19
Demo - 26:59
Taught by
Shaw Talebi