Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO) - Math Explained

Outlier via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the mathematical foundations of Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO) in this comprehensive 25-minute video tutorial. Delve into these popular reinforcement learning methods that have gained prominence through their application in Large Language Models for post-training alignment with preference data. Begin with an intuitive understanding of the problem statement and initial objective, then progress through the analytical derivation of both algorithms. Master key concepts including return functions, value functions, and importance sampling while understanding how these techniques evolved from Trust Region Policy Optimization (TRPO). Follow the complete mathematical derivation from basic principles to the final objectives, with clear explanations of each step in the process. Access extensive supplementary resources including papers on TRPO, PPO, GRPO, and REINFORCE, along with additional materials on log-derivatives, reinforcement learning fundamentals, and importance sampling to deepen your understanding of these crucial optimization techniques used in modern AI systems.

Syllabus

00:00 Introduction
01:17 Problem Statement
03:17 Intuitive Objective
04:07 Analytically Computable Objective
10:11 Return Function
12:07 Value Function
14:53 Importance Sampling
17:40 TRPO
19:16 PPO
21:15 GRPO
23:45 Summary
24:31 Outro

Taught by

Outlier

Reviews

Start your review of Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO) - Math Explained

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.