Break It 'Til You Make It - Building the Self-Improving Stack for AI Agents

Learn to build a comprehensive monitoring and improvement infrastructure for AI agents in production through this 14-minute conference talk. Discover how to move beyond initial deployment to create systems that continuously monitor, evaluate, and enhance agent performance in real-world environments. Explore the integration of evaluation layers, tracing systems, observability tools, experimentation frameworks, and optimization processes into a cohesive feedback loop that enables agents to adapt and improve over time. Examine practical strategies for composing evaluation systems that identify meaningful failure modes, implementing deep tracing and instrumentation for complete visibility into agent behavior, designing experiments that drive measurable improvements, and establishing feedback-driven optimization cycles. Address the critical but often overlooked need to evolve evaluation methods alongside agent improvements, and gain insights from real-world deployments on handling agent drift, silent failures, and unexpected edge cases that emerge in production systems.