Completed
Intro - 0:00
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Fine-tuning LLMs on Human Feedback (RLHF + DPO)
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro - 0:00
- 2 Base Models - 0:25
- 3 InstructGPT - 2:20
- 4 RL from Human Feedback RLHF - 5:18
- 5 Proximal Policy Optimization PPO - 9:20
- 6 Limitations of RLHF - 10:30
- 7 Direct Policy Optimization DPO - 11:50
- 8 Example: Fine-tuning Qwen on Title Preferences - 14:29
- 9 Step 1: Curate preference data - 17:49
- 10 Step 2: Fine-tuning with DPO - 20:53
- 11 Step 3: Evaluate fine-tuning model - 25:27