Generating Laughter - Testing and Evaluating the Success of LLMs for Comedy

Explore innovative testing methodologies for nondeterministic AI models through the unique lens of comedy generation in this 17-minute conference talk. Learn how traditional benchmarks fall short when evaluating large language models (LLMs) for creative applications, drawing insights from real-world experience running New York Times-featured Generative AI comedy shows. Discover multi-tiered feedback loops, chaos testing, and exploratory user testing approaches that evaluate AI outputs based on adaptability and contextual resonance rather than rigid accuracy standards. Understand the critical importance of establishing a reliable "root source of truth" - whether a dataset or core principle - to maintain consistency while embracing the creative unpredictability that makes AI-generated content truly engaging. Gain practical insights into managing the balance between consistency and creativity in generative AI applications, with methods applicable beyond comedy to various functional applications. Whether you're interested in creative AI applications or seeking new approaches to testing nondeterministic models, discover how embracing unpredictability can lead to innovative and resonant results across different contexts and use cases.