Earn Your CS Degree, Tuition-Free, 100% Online!
PowerBI Data Analyst - Create visualizations and dashboards from scratch
Overview
Syllabus
⌨️ 0:00:00 Introduction
⌨️ 0:01:49 R1 Overview - Overview
⌨️ 0:03:52 R1 Overview - DeepSeek R1-zero path
⌨️ 0:05:32 R1 Overview - Reinforcement learning setup
⌨️ 0:08:36 R1 Overview - Group Relative Policy Optimization GRPO
⌨️ 0:13:04 R1 Overview - DeepSeek R1-zero result
⌨️ 0:16:53 R1 Overview - Cold start supervised fine-tuning
⌨️ 0:17:44 R1 Overview - Consistency reward for CoT
⌨️ 0:18:35 R1 Overview - Supervised Fine tuning data generation
⌨️ 0:21:06 R1 Overview - Reinforcement learning with neural reward model
⌨️ 0:22:53 R1 Overview - Distillation
⌨️ 0:26:16 GRPO - Overview
⌨️ 0:26:55 GRPO - PPO vs GRPO
⌨️ 0:30:25 GRPO - PPO formula overview
⌨️ 0:33:25 GRPO - GRPO formula overview
⌨️ 0:36:48 GRPO - GRPO pseudo code
⌨️ 0:38:56 GRPO - GRPO Trainer code
⌨️ 0:49:24 KL Divergence - Overview
⌨️ 0:49:55 KL Divergence - KL Divergence in GRPO vs PPO
⌨️ 0:51:20 KL Divergence - KL Divergence refresher
⌨️ 0:55:32 KL Divergence - Monte Carlo estimation of KL divergence
⌨️ 0:56:43 KL Divergence - Schulman blog
⌨️ 0:57:38 KL Divergence - k1 = logq/p
⌨️ 1:00:01 KL Divergence - k2 = 0.5*logp/q^2
⌨️ 1:02:19 KL Divergence - k3 = p/q - 1 - logp/q
⌨️ 1:04:44 KL Divergence - benchmarking
⌨️ 1:07:28 Conclusion
Taught by
freeCodeCamp.org