Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Reinforcement Learning for Large Language Models - RLHF, PPO, DPO, and GRPO

Serrano.Academy via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn reinforcement learning techniques specifically designed for training and fine-tuning large language models through this comprehensive video tutorial. Master Reinforcement Learning with Human Feedback (RLHF) to understand how to train and fine-tune Transformer models effectively. Explore Proximal Policy Optimization (PPO) methods for training large language models and discover Direct Preference Optimization (DPO) as an alternative approach to fine-tune LLMs without traditional reinforcement learning. Dive into Group Relative Policy Optimization (GRPO) to understand how DeepSeek trains reasoning models, and gain proficiency in KL Divergence to measure differences between distributions. Build foundational knowledge through a friendly introduction to deep reinforcement learning, Q-networks, and policy gradients, providing you with the theoretical and practical understanding needed to implement these advanced techniques in your own LLM projects.

Syllabus

Reinforcement Learning with Human Feedback (RLHF) - How to train and fine-tune Transformer Models
Proximal Policy Optimization (PPO) - How to train Large Language Models
Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning
GRPO - Group Relative Policy Optimization - How DeepSeek trains reasoning models
KL Divergence - How to tell how different two distributions are
A friendly introduction to deep reinforcement learning, Q-networks and policy gradients

Taught by

Serrano.Academy

Reviews

Start your review of Reinforcement Learning for Large Language Models - RLHF, PPO, DPO, and GRPO

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.