Training Small Language Models to Reason with Reinforcement Learning - GRPO from Scratch

Training Small Language Models to Reason with Reinforcement Learning - GRPO from Scratch

Neural Breakdown with AVB via YouTube Direct link

45:36 - 10 Practical Tips for finetuning Reasoning SLMs

15 of 15

15 of 15

45:36 - 10 Practical Tips for finetuning Reasoning SLMs

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Training Small Language Models to Reason with Reinforcement Learning - GRPO from Scratch

Automatically move to the next video in the Classroom when playback concludes

  1. 1 0:00 - Thinking LLMs are taking over!
  2. 2 3:47 - Setting up Reinforcement Learning Environment
  3. 3 4:50 - Reasoning Gym library - Rewards
  4. 4 8:00 - GRPO Visually explained
  5. 5 10:41 - Policy Optimization and PPO loss Explained
  6. 6 15:45 - Coding response generation
  7. 7 20:55 - Coding Reward Generation & Advantages
  8. 8 26:25 - Calculating log probabilities
  9. 9 30:58 - RL Training loop
  10. 10 33:49 - Visualizing log probabilities post training
  11. 11 36:01 - The GRPO and PPO Loss function
  12. 12 38:19 - Surrogate clipping
  13. 13 41:21 - Supervised Finetuning and LORA training
  14. 14 43:26 - Reasoning SLM results!
  15. 15 45:36 - 10 Practical Tips for finetuning Reasoning SLMs

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.