Completed
⌨️ 0:51:20 KL Divergence - KL Divergence refresher
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
DeepSeek R1 Theory Tutorial - Architecture, GRPO, KL Divergence
Automatically move to the next video in the Classroom when playback concludes
- 1 ⌨️ 0:00:00 Introduction
- 2 ⌨️ 0:01:49 R1 Overview - Overview
- 3 ⌨️ 0:03:52 R1 Overview - DeepSeek R1-zero path
- 4 ⌨️ 0:05:32 R1 Overview - Reinforcement learning setup
- 5 ⌨️ 0:08:36 R1 Overview - Group Relative Policy Optimization GRPO
- 6 ⌨️ 0:13:04 R1 Overview - DeepSeek R1-zero result
- 7 ⌨️ 0:16:53 R1 Overview - Cold start supervised fine-tuning
- 8 ⌨️ 0:17:44 R1 Overview - Consistency reward for CoT
- 9 ⌨️ 0:18:35 R1 Overview - Supervised Fine tuning data generation
- 10 ⌨️ 0:21:06 R1 Overview - Reinforcement learning with neural reward model
- 11 ⌨️ 0:22:53 R1 Overview - Distillation
- 12 ⌨️ 0:26:16 GRPO - Overview
- 13 ⌨️ 0:26:55 GRPO - PPO vs GRPO
- 14 ⌨️ 0:30:25 GRPO - PPO formula overview
- 15 ⌨️ 0:33:25 GRPO - GRPO formula overview
- 16 ⌨️ 0:36:48 GRPO - GRPO pseudo code
- 17 ⌨️ 0:38:56 GRPO - GRPO Trainer code
- 18 ⌨️ 0:49:24 KL Divergence - Overview
- 19 ⌨️ 0:49:55 KL Divergence - KL Divergence in GRPO vs PPO
- 20 ⌨️ 0:51:20 KL Divergence - KL Divergence refresher
- 21 ⌨️ 0:55:32 KL Divergence - Monte Carlo estimation of KL divergence
- 22 ⌨️ 0:56:43 KL Divergence - Schulman blog
- 23 ⌨️ 0:57:38 KL Divergence - k1 = logq/p
- 24 ⌨️ 1:00:01 KL Divergence - k2 = 0.5*logp/q^2
- 25 ⌨️ 1:02:19 KL Divergence - k3 = p/q - 1 - logp/q
- 26 ⌨️ 1:04:44 KL Divergence - benchmarking
- 27 ⌨️ 1:07:28 Conclusion