Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Direct Preference Optimization - Paper Explained

Outlier via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn Direct Preference Optimization (DPO), a method for preference tuning large language models that eliminates the need for reward functions by using only preference data. Explore the complete mathematical derivation from initial concept to final objective, understanding how DPO offers more efficient training compared to methods like PPO and GRPO. Follow along as the tutorial breaks down the problem statement, walks through the detailed mathematical derivation, and demonstrates why this approach makes language model training more streamlined by removing the complexity of reward function training.

Syllabus

00:00 Introduction
01:02 Problem Statement
03:08 Derivation
16:21 Outro

Taught by

Outlier

Reviews

Start your review of Direct Preference Optimization - Paper Explained

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.