Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Building a DeepSeek R1-Style Reasoning LLM with GRPO Fine-Tuning

1littlecoder via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to create a Deepseek R1-style reasoning Language Learning Model through an experimental tutorial focused on GRPO-based fine-tuning. Explore the reward functions used in building a Math Reasoner, all implementable within a free Google Colab notebook. Dive into the technical aspects of training, including access to failed training examples through Weights and Biases Dashboard, and examine both standard and A100 GPU-requiring implementations. Follow along with provided Colab notebooks to understand the intricacies of transforming any LLM into a mathematical reasoning powerhouse, while acknowledging the experimental nature of the process and potential variations in success rates.

Syllabus

Turn ANY LLM into a Mini Deepseek R1 Fine-Tuning with GRPO!!!

Taught by

1littlecoder

Reviews

Start your review of Building a DeepSeek R1-Style Reasoning LLM with GRPO Fine-Tuning

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.