AI Engineer - Learn how to integrate AI into software applications
Build the Finance Skills That Lead to Promotions — Not Just Certificates
Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Learn practical strategies for testing and evaluating LLM-driven applications in this 20-minute conference talk from Code BEAM Europe 2025. Explore the challenges developers face when integrating large language models into products, particularly the problem of "confident nonsense" where AI systems provide fluent but incorrect or potentially harmful responses. Discover evaluation techniques ranging from basic BLEU and ROUGE metrics to more sophisticated aspect-based evaluation and retrieval scoring methods. Understand what metrics to measure, when to trust different evaluation approaches, and how to implement testing strategies that can catch problematic AI responses before they reach production users. Gain insights into building robust validation systems for applications that generate human language, moving beyond traditional unit testing to address the unique challenges of LLM integration.
Syllabus
Detecting Confident Nonsense: Testing LLM-Driven Apps - Hernan Rivas Acosta | Code BEAM Europe 2025
Taught by
Code Sync