AI Agent Evaluation - Methods and Best Practices for Measuring Agent Performance
Elvis Saravia via YouTube
Get 50% Off Udacity Nanodegrees — Code CC50
AI Product Expert Certification - Master Generative AI Skills
Overview
Syllabus
00:00 Intros
02:56 Introduction to AI Agent Evaluation
09:06 5 Steps to Evaluation Intelligence
11:13 Components of An Agent
15:08 Integrated Observability
15:49 Play 1 - Create high & low-level metrics
28:12 Play 2 - Select your experimental infrastructure
32:31 Play 3 - Optimize your instructions
34:50 Play 4 - Optimize your retrieval
38:02 Play 5 - Add agentic tests to CI/CD
39:46 Play 6 - Build SLMS for real-time monitoring
43:46 Play 7 - Curate large-scale eval sets
46:21 Play 8 - Improve metric accuracy with/ human feedback
49:27 Building Your Evaluation System
51:23 Agent Leaderboard v2 launch
57:49 Agent Leaderboard v2 ranking
1:02:20 Cost vs Performance
1:05:56 Evaluation Best Practices
1:08:34 The Triad of Tradeoffs
1:10:18 Q&A
Taught by
Elvis Saravia