Training Small Language Models to Reason with Reinforcement Learning - GRPO from Scratch
Neural Breakdown with AVB via YouTube
Learn Backend Development Part-Time, Online
Power BI Fundamentals - Create visualizations and dashboards from scratch
Overview
Syllabus
0:00 - Thinking LLMs are taking over!
3:47 - Setting up Reinforcement Learning Environment
4:50 - Reasoning Gym library - Rewards
8:00 - GRPO Visually explained
10:41 - Policy Optimization and PPO loss Explained
15:45 - Coding response generation
20:55 - Coding Reward Generation & Advantages
26:25 - Calculating log probabilities
30:58 - RL Training loop
33:49 - Visualizing log probabilities post training
36:01 - The GRPO and PPO Loss function
38:19 - Surrogate clipping
41:21 - Supervised Finetuning and LORA training
43:26 - Reasoning SLM results!
45:36 - 10 Practical Tips for finetuning Reasoning SLMs
Taught by
Neural Breakdown with AVB