DeepSeek R1 Theory Overview - From GRPO to Reinforcement Learning and Supervised Fine-Tuning
Yacine Mahdid via YouTube
Learn EDR Internals: Research & Development From The Masters
AI, Data Science & Cloud Certificates from Google, IBM & Meta
Overview
Syllabus
- Introduction: 0:00
- DeepSeek R1-zero path: 2:23
- Reinforcement learning setup: 3:59
- Group Relative Policy Optimization GRPO: 7:03
- DeepSeek R1-zero result: 11:40
- Cold start supervised fine-tuning: 15:30
- Consistency reward for CoT: 16:19
- Supervised Fine tuning data generation: 17:17
- Reinforcement learning with neural reward model: 19:47
- Distillation: 21:26
- Conclusion: 24:34
Taught by
Yacine Mahdid