Completed
0:00 - Introduction and Series Overview
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
How to Evaluate AI Agents - Part 2
Automatically move to the next video in the Classroom when playback concludes
- 1 0:00 - Introduction and Series Overview
- 2 1:26 - Focus of Today: Evaluating AI Agents
- 3 2:10 - Agent Components Overview Router, Skills, Path
- 4 4:39 - How to Evaluate a Router
- 5 6:10 - How to Evaluate Skills API, RAG, Code
- 6 7:37 - Evaluating Agent Paths Trajectory Eval
- 7 9:52 - Evaluation Techniques Overview
- 8 10:15 - Technique 1: LLM as a Judge
- 9 19:44 - Technique 2: Code-Based Evaluation
- 10 22:08 - Technique 3: Human Annotations
- 11 24:24 - Live Demo: Evaluating a Travel Agent
- 12 27:03 - Example of LLM-as-a-Judge in Action
- 13 30:11 - How to Build and Apply Evaluation Templates
- 14 34:50 - Using Test Datasets for Evaluation
- 15 42:04 - Guardrails and Prompt Injection Detection
- 16 46:04 - Summary: Combining Techniques in Dev & Prod
- 17 48:30 - Multimodal Evaluation Note Voice, Image, Video
- 18 49:16 - Final Wrap-Up and Next Steps