MIT Sloan: Lead AI Adoption Across Your Organization — Not Just Pilot It
The Investment Banker Certification
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn to implement a systematic, metric-driven framework for detecting and correcting problematic behaviors in production LLM agent systems through this technical conference talk. Discover how to instrument agent loops with comprehensive observability signals including tool-selection quality, error rates, action progression, latency, and domain-specific metrics, then integrate these into evaluation layers like Galileo for continuous system improvement. Explore the challenges that arise when prompts, retrieval systems, external data, and policies interact unpredictably, causing agents to drift into failure states. Follow a practical demonstration using a stock-trading system example that illustrates how brittle retrieval and faulty business logic lead to undesirable agent behavior, then see how to systematically refactor prompts and adjust retrieval pipelines while verifying improvements through enhanced metrics. Master techniques for adding observability with minimal code changes, pinpointing root causes through detailed tracing, and establishing a virtuous cycle of continuous, metric-validated system enhancement for agentic AI systems operating at production scale.
Syllabus
Taming Rogue AI Agents with Observability-Driven Evaluation
Taught by
Databricks