Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Everything Hard About Building AI Agents Today

MLOps.community via YouTube

Start learning Write review

Build with Azure OpenAI, Copilot Studio & Agentic Frameworks — Microsoft Certified

Learn More →

Master Production-Ready Machine Learning, Step by Step

Learn More →

Overview

AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off

One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.

Unlock All Certificates

Explore the complex challenges of building and evaluating AI agents in production through this 47-minute conference talk featuring Willem Pienaar (CTO of Cleric) and Shreya Shankar (PhD student in data management for machine learning). Delve into the fundamental problem of evaluating agents when "ground truth" is ambiguous and subjective user feedback proves insufficient for performance improvement. Learn about the three critical "gulfs" of human-AI interaction—Specification, Generalization, and Comprehension—and understand how they directly impact agent success rates. Discover strategies for moving humans "out of the loop" for feedback collection and creating faster learning cycles through implicit signals rather than manual review processes. Examine practical evaluation techniques including task failure analysis using heat maps and explore the trade-offs involved in using simulated environments for testing AI agents. Understand the reality of performance ceilings in AI systems and master the art of categorizing problems into three categories: what your agent can solve now, what it can learn to solve, and what it will likely never be able to solve. Gain insights into trust issues in AI data, cloud clarity meets retrieval systems, communication gap fixes, smarter feedback mechanisms for prompts, creative data exploration approaches, custom versus general AI considerations, agent skill enhancement, repeat failure detection, self-healing software concepts, and the complexities of monitoring AI systems in production environments.

Syllabus

[00:00] Trust Issues in AI Data
[04:49] Cloud Clarity Meets Retrieval
[09:37] Why Fast AI Is Hard
[11:10] Fixing AI Communication Gaps
[14:53] Smarter Feedback for Prompts
[19:23] Creativity Through Data Exploration
[23:46] Helping Engineers Solve Faster
[26:03] The Three Gaps in AI
[28:08] Alerts Without the Noise
[33:22] Custom vs General AI
[34:14] Sharpening Agent Skills
[40:01] Catching Repeat Failures
[43:38] Rise of Self-Healing Software
[44:12] The Chaos of Monitoring AI