Evals Aren't Useful? Really? - Why AI Agent Evaluations Determine Real-World Success

Learn why AI agent evaluations are critical for production success in this 25-minute conference talk that challenges the misconception that evaluations aren't useful. Discover how proper testing, simulation, and failure iteration—rather than simply using bigger models or fancier prompts—determine whether AI agents succeed in real-world applications. Explore why many AI projects fail not due to model limitations but because of inadequate testing practices, and understand the difference between shipping genuine AI products versus deploying untested experiments. Gain insights into stress-testing methodologies and evaluation frameworks that separate successful AI implementations from failed deployments, presented by a data scientist from Prosus Group who cuts through industry hype to focus on practical evaluation strategies.