Group Relative Policy Optimization (GRPO) - Formula and Implementation Tutorial
Yacine Mahdid via YouTube
2,000+ Free Courses with Certificates: Coding, AI, SQL, and More
Earn Your CS Degree, Tuition-Free, 100% Online!
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn about Group Relative Policy Optimization (GRPO), a key algorithm powering the DeepSeek R1 architecture, through a detailed tutorial that breaks down both the mathematical formulas and practical implementation. Explore the differences between PPO and GRPO algorithms, understand their respective formulas, and follow along with a comprehensive code walkthrough featuring the HuggingFace post-training team's implementation. Dive into detailed explanations spanning from theoretical foundations to practical pseudo-code and actual trainer code implementation. Access additional resources including HuggingFace documentation, GitHub repositories, the DeepSeek Math paper, and complementary tutorials to deepen your understanding of GRPO and PPO concepts. Perfect for machine learning practitioners and researchers interested in advanced optimization techniques in AI model training.
Syllabus
- Introduction: 0:00
- PPO vs GRPO: 1:18
- PPO formula overview: 4:24
- GRPO formula overview: 7:49
- GRPO pseudo code: 11:11
- GRPO Trainer code: 13:21
- Conclusion: 23:48
Taught by
Yacine Mahdid