The Most Addictive Python and SQL Courses
Most AI Pilots Fail to Scale. MIT Sloan Teaches You Why — and How to Fix It
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn to develop specialized evaluation frameworks for measuring the effectiveness of domain-specific AI agents in this 34-minute conference talk from Databricks. Explore methodologies that go beyond standard LLM benchmarks to assess agent quality across specialized knowledge domains, tailored workflows, and task-specific objectives. Discover practical approaches for designing robust LLM judges that align with business goals and provide meaningful insights into agent capabilities and limitations. Master tools for creating domain-relevant evaluation datasets and benchmarks that accurately reflect real-world use cases, understand approaches for developing LLM judges to measure domain-specific metrics, and implement strategies for interpreting results to drive iterative improvement in agent performance. Transform your domain-specific agents from experimental tools to trusted enterprise solutions with measurable business value through proper evaluation methodologies presented by Nikhil Thorat and Samraj Moorjani, Software Engineers at Databricks.
Syllabus
Creating LLM judges to Measure Domain-Specific Agent Quality
Taught by
Databricks