PowerBI Data Analyst - Create visualizations and dashboards from scratch
Launch Your Cybersecurity Career in 6 Months
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore groundbreaking research from Princeton University and University of Illinois that reveals critical flaws in implicit reward models used for reinforcement learning alignment. Discover why Direct Preference Optimization (DPO) and similar implicit reward approaches fail to generalize effectively, while traditional explicit reward models from Reinforcement Learning from Human Feedback (RLHF) continue to perform exceptionally well. Examine the significant performance gap between these two approaches and understand the underlying mechanisms that cause implicit reward models to struggle with generalization. Learn about the latest findings from researchers Noam Razin, Yong Lin, Jiarui Yao, and Sanjeev Arora as they investigate why language models serve as poor implicit reward models and what this means for the future of AI alignment strategies.
Syllabus
AI FALLS: DPO RL crumbles (Princeton)
Taught by
Discover AI