AI Agent Evals - From Testing to Trust
MLOps World: Machine Learning in Production via YouTube
-
34
-
- Write review
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Stuck in Tutorial Hell? Learn Backend Dev the Right Way
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Discover how to build reliable AI systems that transition successfully from demo environments to production through comprehensive evaluation strategies in this 27-minute conference talk. Learn why evaluation serves as the foundation of trust in AI agents and explore how leading teams integrate testing and monitoring throughout the entire product development lifecycle. Understand practical approaches to optimize context using write, select, compress, and isolate strategies while examining real-world case studies from both startups and enterprises. Gain insights into designing scalable LLM evaluation workflows that maintain quality and reliability across different environments, and discover what actually works when implementing evaluation pipelines in production settings versus what commonly fails.
Syllabus
AI Agent Evals: From Testing to Trust | Vaibhavi Gangwar, Maxim AI
Taught by
MLOps World: Machine Learning in Production