Gain a Splash of New Skills - Coursera+ Annual Nearly 45% Off
Master Finance Tools - 35% Off CFI (Code CFI35)
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the complex challenges of building and evaluating AI agents in production through this 47-minute conference talk featuring Willem Pienaar (CTO of Cleric) and Shreya Shankar (PhD student in data management for machine learning). Delve into the fundamental problem of evaluating agents when "ground truth" is ambiguous and subjective user feedback proves insufficient for performance improvement. Learn about the three critical "gulfs" of human-AI interaction—Specification, Generalization, and Comprehension—and understand how they directly impact agent success rates. Discover strategies for moving humans "out of the loop" for feedback collection and creating faster learning cycles through implicit signals rather than manual review processes. Examine practical evaluation techniques including task failure analysis using heat maps and explore the trade-offs involved in using simulated environments for testing AI agents. Understand the reality of performance ceilings in AI systems and master the art of categorizing problems into three categories: what your agent can solve now, what it can learn to solve, and what it will likely never be able to solve. Gain insights into trust issues in AI data, cloud clarity meets retrieval systems, communication gap fixes, smarter feedback mechanisms for prompts, creative data exploration approaches, custom versus general AI considerations, agent skill enhancement, repeat failure detection, self-healing software concepts, and the complexities of monitoring AI systems in production environments.
Syllabus
[00:00] Trust Issues in AI Data
[04:49] Cloud Clarity Meets Retrieval
[09:37] Why Fast AI Is Hard
[11:10] Fixing AI Communication Gaps
[14:53] Smarter Feedback for Prompts
[19:23] Creativity Through Data Exploration
[23:46] Helping Engineers Solve Faster
[26:03] The Three Gaps in AI
[28:08] Alerts Without the Noise
[33:22] Custom vs General AI
[34:14] Sharpening Agent Skills
[40:01] Catching Repeat Failures
[43:38] Rise of Self-Healing Software
[44:12] The Chaos of Monitoring AI
Taught by
MLOps.community