Power BI Fundamentals - Create visualizations and dashboards from scratch
JavaScript Programming for Beginners
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the technical details of DeepSeek R1's reinforcement learning implementation in this 35-minute technical video that breaks down the Group Relative Policy Optimization (GRPO) approach. Learn about the evolution from PPO to GRPO, memory optimization techniques, and the importance of group relative advantages in AI model training. Dive deep into key concepts including KL-divergence, reward signal implementation, and practical applications like training a Rust reasoner. Through detailed chapter breakdowns, understand how GRPO improves upon traditional reinforcement learning methods while addressing memory usage challenges in large-scale AI model development.
Syllabus
0:00 Intro
0:52 Recap of R1
2:35 Why is GRPO Important
3:41 From PPO to GRPO
7:31 Reducing Memory Usage with GRPO
12:23 Group Relatives Advantages
20:41 KL-Divergence
27:53 The Reward Signals
31:09 Training a Rust Reasoner
Taught by
Oxen