KL Divergence Implementation in DeepSeek R1 - A Deep Learning Tutorial

KL Divergence Implementation in DeepSeek R1 - A Deep Learning Tutorial

Yacine Mahdid via YouTube Direct link

- Introduction: 0:00

1 of 10

1 of 10

- Introduction: 0:00

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

KL Divergence Implementation in DeepSeek R1 - A Deep Learning Tutorial

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - Introduction: 0:00
  2. 2 - KL Divergence in GRPO vs PPO: 1:00
  3. 3 - KL Divergence refresher: 2:30
  4. 4 - Monte Carlo estimation of KL divergence: 6:42
  5. 5 - Schulman blog: 7:58
  6. 6 - k1 = logq/p: 8:55
  7. 7 - k2 = 0.5*logp/q^2: 11:23
  8. 8 - k3 = p/q - 1 - logp/q: 13:35
  9. 9 - benchmarking: 15:58
  10. 10 - takeaways: 18:43

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.