Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Self Learning AI - Accelerating with New Reinforcement Learning Techniques

Discover AI via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a 25-minute video examining the critical challenge of policy collapse in self-supervised reinforcement learning for large language models and discover a novel solution through momentum-anchored policy optimization. Learn how the frontier of LLM research has shifted toward post-training and System 2 reasoning, where the goal is to replicate O1-level performance by moving beyond supervised fine-tuning to embrace reinforcement learning with verifiable rewards. Understand why current self-supervised methods face a fundamental instability where models begin to "game" the reward signal as they train on their own pseudo-labels, leading to overconfidence, entropy collapse, and degraded reasoning performance. Examine the mathematical evidence showing that the standard industry approach of scaling rollout samples only delays but cannot prevent this inevitable crash. Discover the M-GRPO (Momentum-Anchored Policy Optimization) framework that fundamentally changes how models interact with their training history to bypass policy collapse entirely and achieve state-of-the-art performance where previous baselines failed. Gain insights into how this architectural approach enables truly self-supervised reinforcement learning where models can generate their own questions, verify reasoning chains, and improve indefinitely without expensive human annotations, based on research from Shanghai Innovation Institute, Fudan University, Shanghai AI Laboratory, and The Chinese University of Hong Kong.

Syllabus

Self Learning AI: Accelerate w/ new RL

Taught by

Discover AI

Reviews

Start your review of Self Learning AI - Accelerating with New Reinforcement Learning Techniques

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.