Training Agents with Reinforcement Learning

Explore the cutting-edge intersection of reinforcement learning and AI agent development in this 41-minute conference talk featuring Kyle Corbitt, co-founder and CEO of OpenPip (recently acquired by CoreWeave). Discover how reinforcement learning is revolutionizing the creation of smarter, more reliable AI agents as Corbitt shares OpenPipe's evolution from supervised fine-tuning to developing ART (Agent Reinforcement Trainer), their open-source RL toolkit. Learn about the technical distinctions between reinforcement learning and supervised fine-tuning, including weight movement patterns and model reliability improvements. Understand OpenPipe's innovative approach to multi-turn agent training and tool use, and how their implementation differs from OpenAI's RFT methodology. Gain insights into maintaining consistent agent behavior in production environments and the critical role of RL in achieving this reliability. Examine strategies for avoiding reward hacking through Ruler, OpenPipe's LLM-based judging system, and explore cost-efficiency approaches in RL training using serverless infrastructure. Delve into OpenPipe's long-term vision for self-improving agents and receive practical advice for AI startup founders navigating the rapidly evolving ecosystem. Access comprehensive coverage of why reinforcement learning is becoming essential in modern agent development, practical applications in real-world scenarios, and startup lessons from YC's Startup School experience.