Learn Generative AI, Prompt Engineering, and LLMs for Free
Build AI Apps with Azure, Copilot, and Generative AI — Microsoft Certified
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn about Direct Preference Optimization (DPO) and reasoning language models in this comprehensive lecture from the University of Utah Data Science program. Explore the theoretical foundations and practical applications of DPO as an alternative to reinforcement learning from human feedback (RLHF) for aligning language models with human preferences. Discover how reasoning capabilities can be enhanced in large language models through various training methodologies and architectural considerations. Examine case studies and implementation strategies for developing more reliable and interpretable AI systems that can perform complex reasoning tasks. Access accompanying presentation slides to reinforce key concepts and follow along with technical demonstrations throughout this 81-minute academic session.
Syllabus
DPO & Reasoning LMs
Taught by
UofU Data Science