Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Learn how to systematically improve AI agents through Eval-Driven Development (EDD), a scientific approach that replaces intuitive "vibe checks" with rigorous experimentation and evaluation methods. Discover the mental framework for consistently enhancing AI agent performance by applying scientific methodology to machine learning systems development. Explore techniques for using large language models as effective proxies for human judgment in evaluation processes, and understand how to build data flywheels that improve alignment between AI systems and desired outcomes. Master the selection of appropriate metrics for measuring agent performance and establish robust feedback loops from production environments to identify and address long-tail scenarios that impact system reliability. Gain insights into transforming AI agent development from an art form into a systematic, science-based engineering discipline that delivers predictable improvements over time.
Syllabus
EDD: The Science of Improving AI Agents // Shahul Elavakkattil Shereef // Agents in Production 2025
Taught by
MLOps.community