DeepSeek R1 Theory Overview - From GRPO to Reinforcement Learning and Supervised Fine-Tuning

DeepSeek R1 Theory Overview - From GRPO to Reinforcement Learning and Supervised Fine-Tuning

Yacine Mahdid via YouTube Direct link

- Reinforcement learning setup: 3:59

3 of 11

3 of 11

- Reinforcement learning setup: 3:59

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

DeepSeek R1 Theory Overview - From GRPO to Reinforcement Learning and Supervised Fine-Tuning

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - Introduction: 0:00
  2. 2 - DeepSeek R1-zero path: 2:23
  3. 3 - Reinforcement learning setup: 3:59
  4. 4 - Group Relative Policy Optimization GRPO: 7:03
  5. 5 - DeepSeek R1-zero result: 11:40
  6. 6 - Cold start supervised fine-tuning: 15:30
  7. 7 - Consistency reward for CoT: 16:19
  8. 8 - Supervised Fine tuning data generation: 17:17
  9. 9 - Reinforcement learning with neural reward model: 19:47
  10. 10 - Distillation: 21:26
  11. 11 - Conclusion: 24:34

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.