Google AI Professional Certificate - Learn AI Skills That Get You Hired
AI Engineer - Learn how to integrate AI into software applications
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn Direct Preference Optimization (DPO), a method for preference tuning large language models that eliminates the need for reward functions by using only preference data. Explore the complete mathematical derivation from initial concept to final objective, understanding how DPO offers more efficient training compared to methods like PPO and GRPO. Follow along as the tutorial breaks down the problem statement, walks through the detailed mathematical derivation, and demonstrates why this approach makes language model training more streamlined by removing the complexity of reward function training.
Syllabus
00:00 Introduction
01:02 Problem Statement
03:08 Derivation
16:21 Outro
Taught by
Outlier