How to Evaluate AI Agents - Part 2

How to Evaluate AI Agents - Part 2

Data Science Dojo via YouTube Direct link

0:00 - Introduction and Series Overview

1 of 18

1 of 18

0:00 - Introduction and Series Overview

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

How to Evaluate AI Agents - Part 2

Automatically move to the next video in the Classroom when playback concludes

  1. 1 0:00 - Introduction and Series Overview
  2. 2 1:26 - Focus of Today: Evaluating AI Agents
  3. 3 2:10 - Agent Components Overview Router, Skills, Path
  4. 4 4:39 - How to Evaluate a Router
  5. 5 6:10 - How to Evaluate Skills API, RAG, Code
  6. 6 7:37 - Evaluating Agent Paths Trajectory Eval
  7. 7 9:52 - Evaluation Techniques Overview
  8. 8 10:15 - Technique 1: LLM as a Judge
  9. 9 19:44 - Technique 2: Code-Based Evaluation
  10. 10 22:08 - Technique 3: Human Annotations
  11. 11 24:24 - Live Demo: Evaluating a Travel Agent
  12. 12 27:03 - Example of LLM-as-a-Judge in Action
  13. 13 30:11 - How to Build and Apply Evaluation Templates
  14. 14 34:50 - Using Test Datasets for Evaluation
  15. 15 42:04 - Guardrails and Prompt Injection Detection
  16. 16 46:04 - Summary: Combining Techniques in Dev & Prod
  17. 17 48:30 - Multimodal Evaluation Note Voice, Image, Video
  18. 18 49:16 - Final Wrap-Up and Next Steps

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.