Beyond Gemini: Using Reinforcement Learning to Unlock Reliable AI Agents with Open LLMs

In this 33-minute conference talk from MLOps World, Julien Launay, CEO and co-founder of Adaptive ML, explores how reinforcement learning (RL) can create reliable AI agents using open-source large language models. Learn how EdTech organization Alloprof developed an AI student support agent superior to Khanmigo by embedding domain expertise through RL fine-tuning rather than prompt engineering alone. Discover how smaller, open-weight models fine-tuned with primarily synthetic data consistently outperformed state-of-the-art models including Gemini Coach. The presentation covers advanced techniques including dynamic retrieval-augmented generation (agentic RAG), adaptive communication strategies refined through iterative feedback, and synthetic data approaches like self-play that eliminate the need for extensive real-world data collection. As the former technical lead behind Falcon 40B and 180B LLMs and contributor to BLOOM, Launay shares practical insights on democratizing reinforcement fine-tuning for production AI systems.