Reinforcement Learning with Human Feedback (RLHF), Clearly Explained

This 18-minute educational video explains the complete process of training Large Language Models (LLMs) like ChatGPT and DeepSeek, with a particular focus on Reinforcement Learning with Human Feedback (RLHF). Learn how LLMs are initially pre-trained on massive text datasets but require additional training to generate helpful and polite responses. Discover the three key stages of LLM development: pre-training, supervised fine-tuning, and RLHF. The video breaks down the RLHF process in detail, explaining how reward models are trained and implemented to align AI responses with human preferences. Based on the original Instruct-GPT research paper, this StatQuest tutorial provides a clear, comprehensive explanation of how modern AI assistants are taught to provide useful responses to human prompts.

Syllabus

0:00 Awesome song and introduction
2:25 Pre-Training an LLM
5:06 Supervised Fine-Tuning
7:35 Reinforcement Learning with Human Feedback RLHF
10:07 RLHF - training the reward model
15:02 RLHF - using the reward model