AI Evaluation from First Principles - You Can't Manage What You Can't Measure
Databricks via YouTube
Most AI Pilots Fail to Scale. MIT Sloan Teaches You Why — and How to Fix It
Get 20% off all career paths from fullstack to AI
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn to build effective evaluation systems for GenAI applications through this 39-minute research session that addresses the critical challenge of measuring AI quality in organizations. Discover fundamental principles of GenAI evaluation and gain practical frameworks for establishing reliable metrics, even for subjective assessments. Explore techniques for calibrating LLM judges to create cost-effective and scalable evaluation processes that can adapt as your AI capabilities evolve. Master actionable approaches for defining meaningful quality metrics tailored to your specific use cases, transforming uncertain AI development into measurable, systematic improvement. Understand how to build evaluation systems that clearly identify what's working and what needs improvement in your AI implementations. Presented by Jonathan Frankle, Chief Scientist - Neural Networks at Databricks, and Pallavi Koppol, Research Scientist at Databricks, this session provides essential knowledge for developers, AI solution implementers, and technical leaders seeking to move beyond guesswork toward data-driven AI quality management using Databricks tools and methodologies.
Syllabus
AI Evaluation from First Principles: You Can't Manage What You Can't Measure
Taught by
Databricks