Group Relative Policy Optimization (GRPO) - Formula and Implementation Tutorial
Yacine Mahdid via YouTube
MIT Sloan: Lead AI Adoption Across Your Organization — Not Just Pilot It
Launch a New Career with Certificates from Google, IBM & Microsoft
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn about Group Relative Policy Optimization (GRPO), a key algorithm powering the DeepSeek R1 architecture, through a detailed tutorial that breaks down both the mathematical formulas and practical implementation. Explore the differences between PPO and GRPO algorithms, understand their respective formulas, and follow along with a comprehensive code walkthrough featuring the HuggingFace post-training team's implementation. Dive into detailed explanations spanning from theoretical foundations to practical pseudo-code and actual trainer code implementation. Access additional resources including HuggingFace documentation, GitHub repositories, the DeepSeek Math paper, and complementary tutorials to deepen your understanding of GRPO and PPO concepts. Perfect for machine learning practitioners and researchers interested in advanced optimization techniques in AI model training.
Syllabus
- Introduction: 0:00
- PPO vs GRPO: 1:18
- PPO formula overview: 4:24
- GRPO formula overview: 7:49
- GRPO pseudo code: 11:11
- GRPO Trainer code: 13:21
- Conclusion: 23:48
Taught by
Yacine Mahdid