Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Length Controlled Policy Optimization for Scaling Reinforcement Learning - CMU Research

Discover AI via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore the new Length Controlled Policy Optimization (LCPO) technique in this research video from Carnegie Mellon University. Learn about this simple reinforcement learning method that optimizes for both accuracy and adherence to user-specified length constraints. Discover how CMU researchers applied LCPO to train L1, a reasoning language model capable of producing outputs that satisfy length constraints specified in prompts. Understand how LCPO builds upon Group Relative Policy Optimization (GRPO), a method for scaling reinforcement learning developed by DeepSeekMath/R1. This presentation covers the research paper "L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning" by Pranjal Aggarwal and Sean Welleck from Carnegie Mellon University, offering valuable insights for those interested in AI research, AI agents, and AI policy.

Syllabus

NEW L1 LLM w/ GRPO to LCPO for Scaling RL (CMU)

Taught by

Discover AI

Reviews

Start your review of Length Controlled Policy Optimization for Scaling Reinforcement Learning - CMU Research

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.