Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

DeepSeek R1 Theory Overview - From GRPO to Reinforcement Learning and Supervised Fine-Tuning

Yacine Mahdid via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about the training methodology behind DeepSeek R1 in this detailed tutorial video that breaks down the complex paper into digestible segments. Explore the complete training pipeline, starting with the R1-zero path and progressing through reinforcement learning setups, Group Relative Policy Optimization (GRPO), supervised fine-tuning techniques, and neural reward models. Gain insights into cold start supervised fine-tuning, consistency rewards for Chain of Thought (CoT), data generation processes, and the final distillation phase. Follow along with a helpful visualization map while understanding each component of the training process, complemented by references to additional resources for deeper understanding of specific concepts like GRPO.

Syllabus

- Introduction: 0:00
- DeepSeek R1-zero path: 2:23
- Reinforcement learning setup: 3:59
- Group Relative Policy Optimization GRPO: 7:03
- DeepSeek R1-zero result: 11:40
- Cold start supervised fine-tuning: 15:30
- Consistency reward for CoT: 16:19
- Supervised Fine tuning data generation: 17:17
- Reinforcement learning with neural reward model: 19:47
- Distillation: 21:26
- Conclusion: 24:34

Taught by

Yacine Mahdid

Reviews

Start your review of DeepSeek R1 Theory Overview - From GRPO to Reinforcement Learning and Supervised Fine-Tuning

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.