Training LLMs to Think - Understanding o1 and DeepSeek-R1 Models

Training LLMs to Think - Understanding o1 and DeepSeek-R1 Models

Shaw Talebi via YouTube Direct link

Intro - 0:00

1 of 17

1 of 17

Intro - 0:00

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Training LLMs to Think - Understanding o1 and DeepSeek-R1 Models

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro - 0:00
  2. 2 OpenAI's o1 - 0:33
  3. 3 Test-time Compute - 1:33
  4. 4 "Thinking" Tokens - 3:50
  5. 5 DeepSeek Paper - 5:58
  6. 6 Reinforcement Learning - 7:22
  7. 7 R1-Zero: Prompt Template - 9:28
  8. 8 R1-Zero: Reward - 10:53
  9. 9 R1-Zero: GRPO technical - 12:53
  10. 10 R1-Zero: Results - 20:00
  11. 11 DeepSeek R1 - 23:32
  12. 12 Step 1: SFT with CoT - 24:47
  13. 13 Step 2: R1-Zero Style RL - 26:14
  14. 14 Step 3: SFT with Mixed Data - 27:03
  15. 15 Step 4: RL & RLHF - 28:26
  16. 16 Accessing DeepSeek Models - 29:18
  17. 17 Conclusions - 30:10

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.