Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

AI FALLS - DPO RL Crumbles

Discover AI via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore groundbreaking research from Princeton University and University of Illinois that reveals critical flaws in implicit reward models used for reinforcement learning alignment. Discover why Direct Preference Optimization (DPO) and similar implicit reward approaches fail to generalize effectively, while traditional explicit reward models from Reinforcement Learning from Human Feedback (RLHF) continue to perform exceptionally well. Examine the significant performance gap between these two approaches and understand the underlying mechanisms that cause implicit reward models to struggle with generalization. Learn about the latest findings from researchers Noam Razin, Yong Lin, Jiarui Yao, and Sanjeev Arora as they investigate why language models serve as poor implicit reward models and what this means for the future of AI alignment strategies.

Syllabus

AI FALLS: DPO RL crumbles (Princeton)

Taught by

Discover AI

Reviews

Start your review of AI FALLS - DPO RL Crumbles

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.