Reinforcement Learning for Large Language Models - RLHF, PPO, DPO, and GRPO

Serrano.Academy via YouTube Direct link

GRPO - Group Relative Policy Optimization - How DeepSeek trains reasoning models

4

of 6

4 of 6

GRPO - Group Relative Policy Optimization - How DeepSeek trains reasoning models

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Reinforcement Learning for Large Language Models - RLHF, PPO, DPO, and GRPO