Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Beyond the Gold Standard - Evaluating and Trusting Agents in the Wild

MLOps.community via YouTube

Overview

Build a Learning Habit
Download Class Central's free printable study calendar
Download for Free
Learn how to evaluate and deploy AI agents in production environments beyond traditional accuracy benchmarks through this 25-minute conference talk from the Coding Agents Conference. Discover the critical challenges of moving from controlled testing environments to real-world deployment where agents encounter ambiguous data, edge cases, and complex workflows that don't exist in standard benchmarks. Explore technical strategies for building "living ground truth" systems that evolve with your deployed agents, incorporating structured feedback from subject matter experts to maintain reliability over time. Examine practical frameworks for auditing, measuring, and improving agent trustworthiness using healthcare examples where high accuracy is essential, such as validating clinical dates and bed levels where 80% accuracy falls short of requirements. Understand how these reliability principles apply across various industries including e-commerce, fraud detection, and logistics, addressing the fundamental question of determining when an agent is production-ready and maintaining its trustworthiness post-deployment.

Syllabus

Beyond the Gold Standard: Evaluating and Trusting Agents in the Wild // Sanjana Sharma

Taught by

MLOps.community

Reviews

Start your review of Beyond the Gold Standard - Evaluating and Trusting Agents in the Wild

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.