Power BI Fundamentals - Create visualizations and dashboards from scratch
40% Off Career-Building Certificates
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore groundbreaking research from Princeton University and University of Illinois that reveals critical flaws in implicit reward models used for reinforcement learning alignment. Discover why Direct Preference Optimization (DPO) and similar implicit reward approaches fail to generalize effectively, while traditional explicit reward models from Reinforcement Learning from Human Feedback (RLHF) continue to perform exceptionally well. Examine the significant performance gap between these two approaches and understand the underlying mechanisms that cause implicit reward models to struggle with generalization. Learn about the latest findings from researchers Noam Razin, Yong Lin, Jiarui Yao, and Sanjeev Arora as they investigate why language models serve as poor implicit reward models and what this means for the future of AI alignment strategies.
Syllabus
AI FALLS: DPO RL crumbles (Princeton)
Taught by
Discover AI