Learn to fine-tune Large Language Models (LLMs) for reasoning tasks in this 27-minute tutorial video that demonstrates using the GRPO reinforcement learning algorithm with minimal GPU requirements. Explore the complete process from environment setup to testing results, including detailed explanations of GRPO methodology, data preparation, model configuration, and reward function implementation. Master local LLM fine-tuning using the Unsloth fast fine-tuning Python library, requiring only 7GB of VRAM. Follow along with practical demonstrations of training procedures, analyze training outcomes, and understand how to test the fine-tuned model effectively. Access comprehensive resources including GitHub repositories, Hugging Face documentation, and Unsloth notebooks to support the implementation process.

Syllabus

00:00 Intro
01:02 Explaining GRPO
08:03 Environment Setup guidelines
10:20 Data , Model & Reward functions
17:57 Training
21:24 Training results
23:47 Testing