Beyond Gemini: Using Reinforcement Learning to Unlock Reliable AI Agents with Open LLMs
MLOps World: Machine Learning in Production via YouTube
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Lead AI Strategy with UCSB's Agentic AI Program — Microsoft Certified
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
In this 33-minute conference talk from MLOps World, Julien Launay, CEO and co-founder of Adaptive ML, explores how reinforcement learning (RL) can create reliable AI agents using open-source large language models. Learn how EdTech organization Alloprof developed an AI student support agent superior to Khanmigo by embedding domain expertise through RL fine-tuning rather than prompt engineering alone. Discover how smaller, open-weight models fine-tuned with primarily synthetic data consistently outperformed state-of-the-art models including Gemini Coach. The presentation covers advanced techniques including dynamic retrieval-augmented generation (agentic RAG), adaptive communication strategies refined through iterative feedback, and synthetic data approaches like self-play that eliminate the need for extensive real-world data collection. As the former technical lead behind Falcon 40B and 180B LLMs and contributor to BLOOM, Launay shares practical insights on democratizing reinforcement fine-tuning for production AI systems.
Syllabus
Beyond Gemini: Using RL to unlock reliable AI agents with open LLMs
Taught by
MLOps World: Machine Learning in Production