Product Metrics are LLM Evals - Making AI Products More Accurate and Reliable

Google, IBM & Meta Certificates — 40% Off for a Limited Time

Learn More →

Build with Azure OpenAI, Copilot Studio & Agentic Frameworks — Microsoft Certified

Learn More →

Overview

Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off

One annual plan covers every course and certificate on Coursera. 40% off for a limited time.

Get Full Access

Explore how to build more accurate and reliable AI products through effective LLM evaluation strategies in this 53-minute conference talk featuring Raza Habib, CEO and Co-founder of Humanloop. Learn to shorten feedback loops in your evaluations, rapidly iterate on prompts, and systematically test what works in production environments. Discover practical approaches to system failure analysis and resolution, understand the challenges of deploying LLMs in real-world applications, and master the fundamentals of tracing and observability for AI systems. Examine techniques for optimizing model performance through strategic parameter tuning, explore the intersection of prompt engineering with psychological principles, and understand why data expertise is crucial for AI product success. Gain insights into configuration management for complex AI systems, identify key metrics that matter for customer-facing AI applications, and learn about private model deployment strategies. Investigate how LLM agents are transforming conversational interfaces, uncover the hidden complexities of prompt management within existing frameworks, and compare streaming versus batch processing approaches for different use cases. Get an exclusive look at auto-tuning AI prototypes and understand how to architect smarter AI systems from the ground up. Throughout the discussion, discover why continuous feedback mechanisms are essential for AI product success, supported by insights from Anthropic's research and real-world case studies from companies like Duolingo, Vanta, and Gusto.

Syllabus

[00:00] Cracking Open System Failures and How We Fix Them
[05:44] LLMs in the Wild — First Steps and Growing Pains
[08:28] Building the Backbone of Tracing and Observability
[13:02] Tuning the Dials for Peak Model Performance
[13:51] From Growing Pains to Glowing Gains in AI Systems
[17:26] Where Prompts Meet Psychology and Code
[22:40] Why Data Experts Deserve a Seat at the Table
[24:59] Humanloop and the Art of Configuration Taming
[28:23] What Actually Matters in Customer-Facing AI
[33:43] Starting Fresh with Private Models That Deliver
[34:58] How LLM Agents Are Changing the Way We Talk
[39:23] The Secret Lives of Prompts Inside Frameworks
[42:58] Streaming Showdowns — Creativity vs. Convenience
[46:26] Meet Our Auto-Tuning AI Prototype
[49:25] Building the Blueprint for Smarter AI
[51:24] Feedback Isn’t Optional — It’s Everything