Beyond Gemini: Using Reinforcement Learning to Unlock Reliable AI Agents with Open LLMs
MLOps World: Machine Learning in Production via YouTube
Free AI-powered learning to build in-demand skills
Google AI Professional Certificate - Learn AI Skills That Get You Hired
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
In this 33-minute conference talk from MLOps World, Julien Launay, CEO and co-founder of Adaptive ML, explores how reinforcement learning (RL) can create reliable AI agents using open-source large language models. Learn how EdTech organization Alloprof developed an AI student support agent superior to Khanmigo by embedding domain expertise through RL fine-tuning rather than prompt engineering alone. Discover how smaller, open-weight models fine-tuned with primarily synthetic data consistently outperformed state-of-the-art models including Gemini Coach. The presentation covers advanced techniques including dynamic retrieval-augmented generation (agentic RAG), adaptive communication strategies refined through iterative feedback, and synthetic data approaches like self-play that eliminate the need for extensive real-world data collection. As the former technical lead behind Falcon 40B and 180B LLMs and contributor to BLOOM, Launay shares practical insights on democratizing reinforcement fine-tuning for production AI systems.
Syllabus
Beyond Gemini: Using RL to unlock reliable AI agents with open LLMs
Taught by
MLOps World: Machine Learning in Production