The Investment Banker Certification
PowerBI Data Analyst - Create visualizations and dashboards from scratch
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Dive deep into the mathematical foundations of KL divergence implementation in DeepSeek R1's GRPO through this 20-minute technical tutorial. Learn the key differences between GRPO and PPO's KL divergence approaches, starting with a comprehensive refresher on the concept. Follow along with detailed explanations of Monte Carlo estimation and explore three key formulations: logarithmic ratio (k1), squared logarithmic ratio (k2), and the difference-based approach (k3). Examine practical benchmarking results and gain valuable insights from Schulman's influential blog post on KL approximation. Perfect for machine learning practitioners seeking to understand the mathematical underpinnings of modern deep learning algorithms.
Syllabus
- Introduction: 0:00
- KL Divergence in GRPO vs PPO: 1:00
- KL Divergence refresher: 2:30
- Monte Carlo estimation of KL divergence: 6:42
- Schulman blog: 7:58
- k1 = logq/p: 8:55
- k2 = 0.5*logp/q^2: 11:23
- k3 = p/q - 1 - logp/q: 13:35
- benchmarking: 15:58
- takeaways: 18:43
Taught by
Yacine Mahdid