DeepSeek R1 Theory Overview - From GRPO to Reinforcement Learning and Supervised Fine-Tuning
Yacine Mahdid via YouTube
Power BI Fundamentals - Create visualizations and dashboards from scratch
Save 40% on 3 months of Coursera Plus
Overview
Syllabus
- Introduction: 0:00
- DeepSeek R1-zero path: 2:23
- Reinforcement learning setup: 3:59
- Group Relative Policy Optimization GRPO: 7:03
- DeepSeek R1-zero result: 11:40
- Cold start supervised fine-tuning: 15:30
- Consistency reward for CoT: 16:19
- Supervised Fine tuning data generation: 17:17
- Reinforcement learning with neural reward model: 19:47
- Distillation: 21:26
- Conclusion: 24:34
Taught by
Yacine Mahdid