Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Exploring GRPO Through the RAFT Algorithm - RLHF and RLVR

Yacine Mahdid via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Dive into the mechanics of Group Relative Policy Optimization (GRPO) through an in-depth analysis of the RAFT (Reward rAnked FineTuning) algorithm in this comprehensive study session. Examine how RAFT was originally employed in 2023 to understand Proximal Policy Optimization (PPO) within the Reinforcement Learning from Human Feedback (RLHF) paradigm, and discover its 2025 application for analyzing GRPO in Reinforcement Learning from Verification Rewards (RLVR). Compare and contrast these two algorithmic approaches while exploring their underlying principles and effectiveness. Conclude by integrating insights from "A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce" to understand how to develop algorithms that achieve GRPO-like performance with significantly simplified structural complexity. Gain practical understanding of modern reinforcement learning techniques applied to large language model training and optimization.

Syllabus

Exploring GRPO Through the RAFT algorithm (RLHF and RLVR)

Taught by

Yacine Mahdid

Reviews

Start your review of Exploring GRPO Through the RAFT Algorithm - RLHF and RLVR

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.