Evaluation-Driven Development Workflows - Best Practices and Real-World Scenarios

Learn how to implement Evaluation-Driven Development (EDD) workflows in enterprise AI systems through this 42-minute conference talk from Databricks. Discover how EDD embeds continuous assessment and improvement into the AI development lifecycle to ensure reliable and efficient systems. Explore techniques for creating high-quality evaluation datasets including document analysis, synthetic data generation using Mosaic AI's synthetic data generation API, subject matter expert validation, and relevance filtering to reduce manual effort and accelerate workflows. Understand key evaluation metrics such as context relevance, groundedness, and response accuracy to identify and address common issues like retrieval errors and model limitations. Master the development of custom LLM judges tailored to domain-specific requirements including PII detection and tone assessment. Gain hands-on insights into leveraging tools like Mosaic AI Agent Framework, Agent Evaluation, and MLflow to automate data tracking, streamline workflows, and quantify improvements. Transform your AI development approach to deliver scalable, high-performing systems that drive measurable organizational value through systematic evaluation practices and real-world implementation scenarios.