Beyond Gemini: Using Reinforcement Learning to Unlock Reliable AI Agents with Open LLMs
MLOps World: Machine Learning in Production via YouTube
Earn a Michigan Engineering AI Certificate — Stay Ahead of the AI Revolution
Get 20% off all career paths from fullstack to AI
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
In this 33-minute conference talk from MLOps World, Julien Launay, CEO and co-founder of Adaptive ML, explores how reinforcement learning (RL) can create reliable AI agents using open-source large language models. Learn how EdTech organization Alloprof developed an AI student support agent superior to Khanmigo by embedding domain expertise through RL fine-tuning rather than prompt engineering alone. Discover how smaller, open-weight models fine-tuned with primarily synthetic data consistently outperformed state-of-the-art models including Gemini Coach. The presentation covers advanced techniques including dynamic retrieval-augmented generation (agentic RAG), adaptive communication strategies refined through iterative feedback, and synthetic data approaches like self-play that eliminate the need for extensive real-world data collection. As the former technical lead behind Falcon 40B and 180B LLMs and contributor to BLOOM, Launay shares practical insights on democratizing reinforcement fine-tuning for production AI systems.
Syllabus
Beyond Gemini: Using RL to unlock reliable AI agents with open LLMs
Taught by
MLOps World: Machine Learning in Production