Understanding DeepSeek R1 Reward Modeling and Verifiers for AI Training

Understanding DeepSeek R1 Reward Modeling and Verifiers for AI Training

Chris Hay via YouTube Direct link

18:29 - evoking outputs deepseek style

9 of 16

9 of 16

18:29 - evoking outputs deepseek style

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Understanding DeepSeek R1 Reward Modeling and Verifiers for AI Training

Automatically move to the next video in the Classroom when playback concludes

  1. 1 00:00 - intro
  2. 2 00:53 - deepseek reward modelling
  3. 3 03:20 - format reward verifier
  4. 4 07:31 - accuracy reward verifier
  5. 5 09:43 - boxed reward verifier
  6. 6 12:11 - verifier answer verifier
  7. 7 13:39 - limerick verifier
  8. 8 16:25 - llm verifiers
  9. 9 18:29 - evoking outputs deepseek style
  10. 10 19:07 - greedy sampling
  11. 11 23:10 - top p sampling
  12. 12 30:00 - generating verifier datasets
  13. 13 33:00 - collecting prompts from teacher model deepseek
  14. 14 37:00 - sft training on collected prompts
  15. 15 37:33 - inferring from trained model
  16. 16 38:50 - chain of thought quality

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.