Overview
Syllabus
0:00 - Introduction and Series Overview
1:26 - Focus of Today: Evaluating AI Agents
2:10 - Agent Components Overview Router, Skills, Path
4:39 - How to Evaluate a Router
6:10 - How to Evaluate Skills API, RAG, Code
7:37 - Evaluating Agent Paths Trajectory Eval
9:52 - Evaluation Techniques Overview
10:15 - Technique 1: LLM as a Judge
19:44 - Technique 2: Code-Based Evaluation
22:08 - Technique 3: Human Annotations
24:24 - Live Demo: Evaluating a Travel Agent
27:03 - Example of LLM-as-a-Judge in Action
30:11 - How to Build and Apply Evaluation Templates
34:50 - Using Test Datasets for Evaluation
42:04 - Guardrails and Prompt Injection Detection
46:04 - Summary: Combining Techniques in Dev & Prod
48:30 - Multimodal Evaluation Note Voice, Image, Video
49:16 - Final Wrap-Up and Next Steps
Taught by
Data Science Dojo