Reinforcement Learning for Large Language Models - RLHF, PPO, DPO, and GRPO

Reinforcement Learning for Large Language Models - RLHF, PPO, DPO, and GRPO

Serrano.Academy via YouTube Direct link

Reinforcement Learning with Human Feedback (RLHF) - How to train and fine-tune Transformer Models

1 of 6

1 of 6

Reinforcement Learning with Human Feedback (RLHF) - How to train and fine-tune Transformer Models

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Reinforcement Learning for Large Language Models - RLHF, PPO, DPO, and GRPO

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Reinforcement Learning with Human Feedback (RLHF) - How to train and fine-tune Transformer Models
  2. 2 Proximal Policy Optimization (PPO) - How to train Large Language Models
  3. 3 Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning
  4. 4 GRPO - Group Relative Policy Optimization - How DeepSeek trains reasoning models
  5. 5 KL Divergence - How to tell how different two distributions are
  6. 6 A friendly introduction to deep reinforcement learning, Q-networks and policy gradients

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.