Google AI Professional Certificate - Learn AI Skills That Get You Hired
The Most Addictive Python and SQL Courses
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to systematically evaluate AI agents through a comprehensive open source framework that breaks down performance assessment into three critical dimensions. Explore tool use evaluation by examining each step from tool selection and parameter capture to execution, ensuring individual components operate correctly. Understand trajectory evaluation techniques that scrutinize an agent's overall workflow to verify adherence to optimal and efficient action sequences. Master goal evaluation strategies to quantitatively determine whether agents achieve specified outcomes. Discover how this methodology identifies failure points across evaluation dimensions while providing actionable insights for iterative improvements. Gain a robust, reproducible approach to benchmark and optimize AI agents, effectively bridging the gap between experimental development and reliable production deployment of LLM-based systems that manage complex, multi-step tasks.
Syllabus
Who Let the Bots Out? A Guide to Evaluating AI Agents - James Cha-Earley, Snowflake
Taught by
Linux Foundation