Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Fuzzing in the GenAI Era - AI System Evaluation and Quality Assurance

AI Engineer via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a comprehensive 19-minute conference talk that redefines AI evaluation through the lens of fuzzing methodology, moving beyond traditional static dataset approaches to dynamic stress testing of AI systems. Learn how to identify and address the "last mile problem" in AI applications, where systems appear to work well in controlled environments but fail when exposed to real-world user interactions and edge cases. Discover the brittleness inherent in GenAI applications through concrete examples of chatbot failures and understand why standard evaluation methods fall short in capturing these vulnerabilities. Master the concept of "Haizing" - a systematic approach to simulating unexpected user inputs at scale to uncover corner cases before deployment. Examine two critical components of robust AI evaluation: developing quality metrics that accurately capture human judgment criteria and generating diverse, representative stimuli that can expose potential system failures. Understand how to scale evaluation processes using AI agents as judges, balancing accuracy with latency considerations, and explore reinforcement learning techniques for tuning evaluation systems. Differentiate between fuzzing and adversarial testing approaches in AI contexts, and see how simulation can be framed as prompt optimization. Analyze real-world case studies including implementations at major European and Fortune 500 banks, demonstrating practical applications of these evaluation methodologies for voice agents and AI applications in financial services.

Syllabus

00:00 Introduction to Haizing
01:16 The "Last Mile Problem" in AI
02:47 The Brittleness of GenAI Applications
03:54 Examples of Brittle Chatbots
04:29 Inadequacy of Standard Evaluation Methods
06:09 Haizing: Simulating the Last Mile
08:43 Scaling Evaluation with Agents as Judges
09:29 Verdict: Accuracy vs. Latency
11:47 Scaling Evaluation with RL-Tuned Judges
14:06 Fuzzing vs. Adversarial Testing in AI
14:37 Simulation as Prompt Optimization
16:23 Case Study: Haizing a Major European Bank's AI App
17:05 Case Study: Haizing a F500 Bank's Voice Agents
17:46 Case Study: Scaling Voice Agent Evals with Verdict

Taught by

AI Engineer

Reviews

Start your review of Fuzzing in the GenAI Era - AI System Evaluation and Quality Assurance

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.