Measure and Iterate on AI Application Performance Using W&B Weave

Learn how to measure and iterate on AI application performance using Weights & Biases Weave in this 14-minute tutorial video. Explore the process of setting up rigorous evaluations that help you iterate quickly, optimize across multiple dimensions, and deploy AI applications with confidence. Follow along as the video demonstrates a support agent prototype, examines AI application traces in Weave, and explains how Weave Evaluations work through applications, datasets, and scorers. Get hands-on with Python code implementation for evaluations, learn to review and compare results, and see an improved support agent in action. The video covers the complete workflow from iteration and evaluation to optimization, providing practical insights for measuring and enhancing AI application performance.

Syllabus

0:00 Iteration, evaluation, and optimization
1:55 A support agent prototype in action
2:47 Examining AI application traces in Weave
3:47 How Weave Evaluations work
4:33 Weave applications, datasets, and scorers
6:31 Python code for implementing Evaluations
10:46 Reviewing and comparing Evaluations results in Weave
12:55 An improved support agent in action
13:39 Conclusion and invitation to try W&B Weave