Beyond BLEU and ROUGE - Evaluating LLMs and AI Systems

Beyond BLEU and ROUGE - Evaluating LLMs and AI Systems

Conf42 via YouTube Direct link

00:00 Introduction and Session Overview

1 of 23

1 of 23

00:00 Introduction and Session Overview

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Beyond BLEU and ROUGE - Evaluating LLMs and AI Systems

Automatically move to the next video in the Classroom when playback concludes

  1. 1 00:00 Introduction and Session Overview
  2. 2 00:49 Speaker Introductions
  3. 3 01:47 Evaluating AI Systems: Beyond Traditional Metrics
  4. 4 02:28 Key Dimensions of AI Evaluation
  5. 5 04:50 Limitations of Traditional Metrics: BLEU and ROUGE
  6. 6 06:16 Real-World Examples Highlighting Metric Flaws
  7. 7 10:17 Understanding N-gram Overlap
  8. 8 13:12 Modern Evaluation Frameworks
  9. 9 14:44 Factual Accuracy and Fact Score
  10. 10 20:32 Addressing Toxicity and Bias in AI
  11. 11 23:31 Human-Centric Evaluation Methods
  12. 12 24:35 Conclusion and Transition
  13. 13 25:38 Introduction to Evaluation Shifts
  14. 14 25:50 Strengths of Large Language Models in Evaluation
  15. 15 26:43 Automated Evaluation: Importance and Benefits
  16. 16 27:24 Key Technologies in Automated Evaluation
  17. 17 28:35 Deep Dive: OpenAI Evolve Framework
  18. 18 32:40 Generative Evaluation GE Explained
  19. 19 35:44 Evaluation of Retrieval-Augmented Generation REG Systems
  20. 20 38:19 Human Judgment vs. Automated Metrics
  21. 21 41:23 Building Robust Evaluation Ecosystems
  22. 22 43:05 Real-World Scenario: Evaluating Customer Service AI
  23. 23 45:22 Conclusion and Future of Evaluation

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.