Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Reinforcement Learning with Human Feedback (RLHF), Clearly Explained

StatQuest with Josh Starmer via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This 18-minute educational video explains the complete process of training Large Language Models (LLMs) like ChatGPT and DeepSeek, with a particular focus on Reinforcement Learning with Human Feedback (RLHF). Learn how LLMs are initially pre-trained on massive text datasets but require additional training to generate helpful and polite responses. Discover the three key stages of LLM development: pre-training, supervised fine-tuning, and RLHF. The video breaks down the RLHF process in detail, explaining how reward models are trained and implemented to align AI responses with human preferences. Based on the original Instruct-GPT research paper, this StatQuest tutorial provides a clear, comprehensive explanation of how modern AI assistants are taught to provide useful responses to human prompts.

Syllabus

0:00 Awesome song and introduction
2:25 Pre-Training an LLM
5:06 Supervised Fine-Tuning
7:35 Reinforcement Learning with Human Feedback RLHF
10:07 RLHF - training the reward model
15:02 RLHF - using the reward model

Taught by

StatQuest with Josh Starmer

Reviews

Start your review of Reinforcement Learning with Human Feedback (RLHF), Clearly Explained

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.