Completed
- Cold start supervised fine-tuning: 15:30
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
DeepSeek R1 Theory Overview - From GRPO to Reinforcement Learning and Supervised Fine-Tuning
Automatically move to the next video in the Classroom when playback concludes
- 1 - Introduction: 0:00
- 2 - DeepSeek R1-zero path: 2:23
- 3 - Reinforcement learning setup: 3:59
- 4 - Group Relative Policy Optimization GRPO: 7:03
- 5 - DeepSeek R1-zero result: 11:40
- 6 - Cold start supervised fine-tuning: 15:30
- 7 - Consistency reward for CoT: 16:19
- 8 - Supervised Fine tuning data generation: 17:17
- 9 - Reinforcement learning with neural reward model: 19:47
- 10 - Distillation: 21:26
- 11 - Conclusion: 24:34