PowerBI Data Analyst - Create visualizations and dashboards from scratch
40% Off Career-Building Certificates
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to scale up the evaluation of AI applications through automated evaluation techniques in this comprehensive tutorial. Explore the challenges of evaluating open-ended LLM tasks that typically require human assessment and discover practical solutions using automated evals. Master the typical LLM workflow and understand common problems that arise when building AI applications. Dive deep into two distinct types of automated evaluations and their applications in real-world scenarios. Follow along with a detailed case study featuring an eval-driven LinkedIn Ghostwriter project that demonstrates the complete process from identifying failure modes to creating LLM judges. Gain hands-on experience with curating user inputs, generating content, applying evaluations, and refining results based on feedback. Access example code and references to implement these techniques in your own AI projects, and see a live demonstration of the automated evaluation system in action.
Syllabus
Introduction - 0:00
The Typical LLM Workflow - 0:21
The Problem - 1:11
Automed Evals - 1:50
2 Types of Automated Evals - 4:25
Example: Eval-driven LinkedIn Ghostwriter - 7:03
Step 1: Identify Failure Modes - 9:36
Step 2: Create LLM Judge - 10:49
Step 3: Curate User Inputs - 19:49
Step 4: Generate LI Posts - 20:30
Step 5: Apply Evals - 21:12
Step 6: Review Results and Refine - 22:06
The Results - 25:19
Demo - 26:59
Taught by
Shaw Talebi