Experiments in Scaling Reinforcement Learning with Verifiable Rewards
USC Information Sciences Institute via YouTube
AI, Data Science & Cloud Certificates from Google, IBM & Meta
Learn Python with Generative AI - Self Paced Online
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
This seminar presentation, delivered by Nathan Lambert from the Allen Institute on April 4, 2025, explores the cutting-edge developments in reinforcement learning with language models. Dive into the implications of DeepSeek's R1 reasoning model and the academic efforts to replicate its results. Learn about Reinforcement Learning with Verifiable Rewards (RLVR), scaling efforts for Ai2's OLMo and Tülu language models, and insights suggesting reinforcement learning may be more effective than commonly believed. Drawing from his background in model-based RL and robotics, Lambert provides historical context on language modeling while forecasting how the rapidly expanding language model industry might evolve. As a Senior Research Scientist and post-training lead at the Allen Institute for AI, Lambert brings valuable expertise from his previous work building an RLHF research team at HuggingFace and his PhD research at UC Berkeley on machine learning and robotics.
Syllabus
Experiments in Scaling Reinforcement Learning with Verifiable Rewards
Taught by
USC Information Sciences Institute