Completed
18:29 - evoking outputs deepseek style
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Understanding DeepSeek R1 Reward Modeling and Verifiers for AI Training
Automatically move to the next video in the Classroom when playback concludes
- 1 00:00 - intro
- 2 00:53 - deepseek reward modelling
- 3 03:20 - format reward verifier
- 4 07:31 - accuracy reward verifier
- 5 09:43 - boxed reward verifier
- 6 12:11 - verifier answer verifier
- 7 13:39 - limerick verifier
- 8 16:25 - llm verifiers
- 9 18:29 - evoking outputs deepseek style
- 10 19:07 - greedy sampling
- 11 23:10 - top p sampling
- 12 30:00 - generating verifier datasets
- 13 33:00 - collecting prompts from teacher model deepseek
- 14 37:00 - sft training on collected prompts
- 15 37:33 - inferring from trained model
- 16 38:50 - chain of thought quality