Understanding DeepSeek R1 and GRPO - A Technical Deep Dive

Understanding DeepSeek R1 and GRPO - A Technical Deep Dive

Oxen via YouTube Direct link

25:30 DeepSeek’s Aha Moment

10 of 18

10 of 18

25:30 DeepSeek’s Aha Moment

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Understanding DeepSeek R1 and GRPO - A Technical Deep Dive

Automatically move to the next video in the Classroom when playback concludes

  1. 1 0:00 Preview
  2. 2 0:31 Intro to Arxiv Dives
  3. 3 3:42 Why is R1 Important?
  4. 4 6:38 What is a Reasoning Model?
  5. 5 8:55 What are DeepSeek R1’s Contributions?
  6. 6 12:27 How DeepSeek-v3 Works
  7. 7 16:01 What Hardware do You Need?
  8. 8 16:50 How DeepSeek-R1-Zero Works
  9. 9 17:23 How GRPO works
  10. 10 25:30 DeepSeek’s Aha Moment
  11. 11 29:06 R1 on ARC-AGI Benchmark
  12. 12 30:20 Self-Hosting DeepSeek
  13. 13 31:38 How DeepSeek-R1 Works
  14. 14 34:05 What was the Cold Start Data
  15. 15 36:58 Rejection Sampling and Supervised Fine Tuning
  16. 16 38:30 Helpfulness and Harmlessness Reinforcement Learning
  17. 17 39:45 Distilling Smaller Models
  18. 18 41:25 Distillation vs. Reinforcement Learning

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.