Power BI Fundamentals - Create visualizations and dashboards from scratch
Earn Your Business Degree, Tuition-Free, 100% Online!
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Dive into the mechanics of Group Relative Policy Optimization (GRPO) through an in-depth analysis of the RAFT (Reward rAnked FineTuning) algorithm in this comprehensive study session. Examine how RAFT was originally employed in 2023 to understand Proximal Policy Optimization (PPO) within the Reinforcement Learning from Human Feedback (RLHF) paradigm, and discover its 2025 application for analyzing GRPO in Reinforcement Learning from Verification Rewards (RLVR). Compare and contrast these two algorithmic approaches while exploring their underlying principles and effectiveness. Conclude by integrating insights from "A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce" to understand how to develop algorithms that achieve GRPO-like performance with significantly simplified structural complexity. Gain practical understanding of modern reinforcement learning techniques applied to large language model training and optimization.
Syllabus
Exploring GRPO Through the RAFT algorithm (RLHF and RLVR)
Taught by
Yacine Mahdid