Experiments in Scaling Reinforcement Learning with Verifiable Rewards

This seminar presentation, delivered by Nathan Lambert from the Allen Institute on April 4, 2025, explores the cutting-edge developments in reinforcement learning with language models. Dive into the implications of DeepSeek's R1 reasoning model and the academic efforts to replicate its results. Learn about Reinforcement Learning with Verifiable Rewards (RLVR), scaling efforts for Ai2's OLMo and Tülu language models, and insights suggesting reinforcement learning may be more effective than commonly believed. Drawing from his background in model-based RL and robotics, Lambert provides historical context on language modeling while forecasting how the rapidly expanding language model industry might evolve. As a Senior Research Scientist and post-training lead at the Allen Institute for AI, Lambert brings valuable expertise from his previous work building an RLHF research team at HuggingFace and his PhD research at UC Berkeley on machine learning and robotics.