Completed
00:00 Introduction and Session Overview
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Beyond BLEU and ROUGE - Evaluating LLMs and AI Systems
Automatically move to the next video in the Classroom when playback concludes
- 1 00:00 Introduction and Session Overview
- 2 00:49 Speaker Introductions
- 3 01:47 Evaluating AI Systems: Beyond Traditional Metrics
- 4 02:28 Key Dimensions of AI Evaluation
- 5 04:50 Limitations of Traditional Metrics: BLEU and ROUGE
- 6 06:16 Real-World Examples Highlighting Metric Flaws
- 7 10:17 Understanding N-gram Overlap
- 8 13:12 Modern Evaluation Frameworks
- 9 14:44 Factual Accuracy and Fact Score
- 10 20:32 Addressing Toxicity and Bias in AI
- 11 23:31 Human-Centric Evaluation Methods
- 12 24:35 Conclusion and Transition
- 13 25:38 Introduction to Evaluation Shifts
- 14 25:50 Strengths of Large Language Models in Evaluation
- 15 26:43 Automated Evaluation: Importance and Benefits
- 16 27:24 Key Technologies in Automated Evaluation
- 17 28:35 Deep Dive: OpenAI Evolve Framework
- 18 32:40 Generative Evaluation GE Explained
- 19 35:44 Evaluation of Retrieval-Augmented Generation REG Systems
- 20 38:19 Human Judgment vs. Automated Metrics
- 21 41:23 Building Robust Evaluation Ecosystems
- 22 43:05 Real-World Scenario: Evaluating Customer Service AI
- 23 45:22 Conclusion and Future of Evaluation