Optimize AI Agents with Continuous Model Distillation and Evaluation Using a Data Flywheel

Learn to optimize AI agents through continuous model distillation and evaluation using NVIDIA's Data Flywheel Blueprint in this 16-minute technical demonstration. Explore the production-ready reference workflow built on NVIDIA NeMo and NIM microservices designed to continuously distill, fine-tune, evaluate, and deploy smaller, efficient language models using real-world agent traffic powered by larger LLMs. Discover the architecture and building blocks of the blueprint while following a practical walkthrough that demonstrates replacing a production-grade Llama 70B model with a smaller, faster alternative for agent tool-calling use cases. Master the configuration of your own flywheel to optimize AI agents at scale using LoRA fine-tuning, in-context learning (ICL), zero-shot evaluation, and LLM-as-a-judge scoring techniques. Gain insights into addressing common AI agent challenges including latency, cost optimization, and performance scaling through systematic model optimization approaches.

Syllabus

00:00 - Introduction
00:33 - AI Agent Challenges
00:57 - Data Flywheel Overview
01:52 - NVIDIA Blueprints Overview
02:45 - NVIDIA NeMo Overview
03:37 - Data Flywheel Blueprint Overview
07:10 - Demo of the Data Flywheel Blueprint
08:50 - Deploying the Launchable