Build the Finance Skills That Lead to Promotions — Not Just Certificates
Stuck in Tutorial Hell? Learn Backend Dev the Right Way
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn about Direct Preference Optimization (DPO) and reasoning language models in this comprehensive lecture from the University of Utah Data Science program. Explore the theoretical foundations and practical applications of DPO as an alternative to reinforcement learning from human feedback (RLHF) for aligning language models with human preferences. Discover how reasoning capabilities can be enhanced in large language models through various training methodologies and architectural considerations. Examine case studies and implementation strategies for developing more reliable and interpretable AI systems that can perform complex reasoning tasks. Access accompanying presentation slides to reinforce key concepts and follow along with technical demonstrations throughout this 81-minute academic session.
Syllabus
DPO & Reasoning LMs
Taught by
UofU Data Science