The Fastest Way to Become a Backend Developer Online
2,000+ Free Courses with Certificates: Coding, AI, SQL, and More
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn how to create a Deepseek R1-style reasoning Language Learning Model through an experimental tutorial focused on GRPO-based fine-tuning. Explore the reward functions used in building a Math Reasoner, all implementable within a free Google Colab notebook. Dive into the technical aspects of training, including access to failed training examples through Weights and Biases Dashboard, and examine both standard and A100 GPU-requiring implementations. Follow along with provided Colab notebooks to understand the intricacies of transforming any LLM into a mathematical reasoning powerhouse, while acknowledging the experimental nature of the process and potential variations in success rates.
Syllabus
Turn ANY LLM into a Mini Deepseek R1 Fine-Tuning with GRPO!!!
Taught by
1littlecoder