Master AI & Data—50% Off Udacity (Code CC50)
35% Off Finance Skills That Get You Hired - Code CFI35
Overview
Syllabus
00:00 Introduction and Session Overview
00:49 Speaker Introductions
01:47 Evaluating AI Systems: Beyond Traditional Metrics
02:28 Key Dimensions of AI Evaluation
04:50 Limitations of Traditional Metrics: BLEU and ROUGE
06:16 Real-World Examples Highlighting Metric Flaws
10:17 Understanding N-gram Overlap
13:12 Modern Evaluation Frameworks
14:44 Factual Accuracy and Fact Score
20:32 Addressing Toxicity and Bias in AI
23:31 Human-Centric Evaluation Methods
24:35 Conclusion and Transition
25:38 Introduction to Evaluation Shifts
25:50 Strengths of Large Language Models in Evaluation
26:43 Automated Evaluation: Importance and Benefits
27:24 Key Technologies in Automated Evaluation
28:35 Deep Dive: OpenAI Evolve Framework
32:40 Generative Evaluation GE Explained
35:44 Evaluation of Retrieval-Augmented Generation REG Systems
38:19 Human Judgment vs. Automated Metrics
41:23 Building Robust Evaluation Ecosystems
43:05 Real-World Scenario: Evaluating Customer Service AI
45:22 Conclusion and Future of Evaluation
Taught by
Conf42