AI Evaluation from First Principles - You Can't Manage What You Can't Measure

Learn to build effective evaluation systems for GenAI applications through this 39-minute research session that addresses the critical challenge of measuring AI quality in organizations. Discover fundamental principles of GenAI evaluation and gain practical frameworks for establishing reliable metrics, even for subjective assessments. Explore techniques for calibrating LLM judges to create cost-effective and scalable evaluation processes that can adapt as your AI capabilities evolve. Master actionable approaches for defining meaningful quality metrics tailored to your specific use cases, transforming uncertain AI development into measurable, systematic improvement. Understand how to build evaluation systems that clearly identify what's working and what needs improvement in your AI implementations. Presented by Jonathan Frankle, Chief Scientist - Neural Networks at Databricks, and Pallavi Koppol, Research Scientist at Databricks, this session provides essential knowledge for developers, AI solution implementers, and technical leaders seeking to move beyond guesswork toward data-driven AI quality management using Databricks tools and methodologies.