Complete Beginner's Course on AI Evaluations in 50 Minutes - 2025

Build with Azure OpenAI, Copilot Studio & Agentic Frameworks — Microsoft Certified

Learn More →

Finance Certifications Goldman Sachs & Amazon Teams Trust

Learn More →

Overview

Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off

One annual plan covers every course and certificate on Coursera. 40% off for a limited time.

Get Full Access

Learn AI evaluations through a hands-on tutorial where two product managers build evaluation systems from scratch for an AI customer support agent. Discover the four essential types of AI evaluations that every practitioner should understand, then follow along as the instructors demonstrate the complete process of creating effective evaluation frameworks. Master the fundamentals by watching the creation of evaluation criteria, learning to use Anthropic's console for prompt generation, and understanding how to add human labels to golden datasets. Explore advanced techniques for scaling evaluations using LLM-judge prompts and discover methods for aligning LLM judges with human judgment to ensure reliable assessment outcomes. Gain practical experience in building robust evaluation systems that can effectively measure AI performance in real-world applications.

Syllabus

00:00 What are AI evals and how to get good at them
02:52 The 4 types of AI evaluations everyone should know
06:08 Live demo: Building evals for a customer support agent
10:29 Using Anthropic's console to generate great prompts
15:13 Creating the evaluation criteria
17:40 Adding human labels to the golden dataset
31:05 Scaling evals with LLM-judge prompts
38:21 How to align LLM judges with human judgment