Completed
45:36 - 10 Practical Tips for finetuning Reasoning SLMs
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Training Small Language Models to Reason with Reinforcement Learning - GRPO from Scratch
Automatically move to the next video in the Classroom when playback concludes
- 1 0:00 - Thinking LLMs are taking over!
- 2 3:47 - Setting up Reinforcement Learning Environment
- 3 4:50 - Reasoning Gym library - Rewards
- 4 8:00 - GRPO Visually explained
- 5 10:41 - Policy Optimization and PPO loss Explained
- 6 15:45 - Coding response generation
- 7 20:55 - Coding Reward Generation & Advantages
- 8 26:25 - Calculating log probabilities
- 9 30:58 - RL Training loop
- 10 33:49 - Visualizing log probabilities post training
- 11 36:01 - The GRPO and PPO Loss function
- 12 38:19 - Surrogate clipping
- 13 41:21 - Supervised Finetuning and LORA training
- 14 43:26 - Reasoning SLM results!
- 15 45:36 - 10 Practical Tips for finetuning Reasoning SLMs