RAG Evaluation Is Broken! Here's Why and How to Fix It

Discover why current retrieval-augmented generation (RAG) evaluation methods are fundamentally flawed and learn how to build better assessment frameworks in this 11-minute conference talk. Explore the critical disconnect between high benchmark scores and real-world performance, examining how traditional RAG evaluation metrics reward systems that fail on realistic information retrieval tasks. Understand the "chunking catch-22" and debunk the myth of perfectly contained information that plagues current benchmarking approaches. Learn through practical examples like the "Seinfeld Test" why optimizing for local benchmarks, chunking strategies, and perfect retrieval scores often leads to user disappointment. Gain actionable strategies for implementing meaningful RAG evaluation that reflects how information actually works in practice, enabling you to develop systems that satisfy real users rather than just leaderboards. Master the art of building RAG systems that bridge the gap between theoretical performance and practical utility.