Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Group Relative Policy Optimization (GRPO) - Formula and Implementation Tutorial

Yacine Mahdid via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about Group Relative Policy Optimization (GRPO), a key algorithm powering the DeepSeek R1 architecture, through a detailed tutorial that breaks down both the mathematical formulas and practical implementation. Explore the differences between PPO and GRPO algorithms, understand their respective formulas, and follow along with a comprehensive code walkthrough featuring the HuggingFace post-training team's implementation. Dive into detailed explanations spanning from theoretical foundations to practical pseudo-code and actual trainer code implementation. Access additional resources including HuggingFace documentation, GitHub repositories, the DeepSeek Math paper, and complementary tutorials to deepen your understanding of GRPO and PPO concepts. Perfect for machine learning practitioners and researchers interested in advanced optimization techniques in AI model training.

Syllabus

- Introduction: 0:00
- PPO vs GRPO: 1:18
- PPO formula overview: 4:24
- GRPO formula overview: 7:49
- GRPO pseudo code: 11:11
- GRPO Trainer code: 13:21
- Conclusion: 23:48

Taught by

Yacine Mahdid

Reviews

Start your review of Group Relative Policy Optimization (GRPO) - Formula and Implementation Tutorial

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.