AI Adoption - Drive Business Value and Organizational Impact
AI Engineer - Learn how to integrate AI into software applications
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about Direct Preference Optimization (DPO) and reasoning language models in this comprehensive lecture from the University of Utah Data Science program. Explore the theoretical foundations and practical applications of DPO as an alternative to reinforcement learning from human feedback (RLHF) for aligning language models with human preferences. Discover how reasoning capabilities can be enhanced in large language models through various training methodologies and architectural considerations. Examine case studies and implementation strategies for developing more reliable and interpretable AI systems that can perform complex reasoning tasks. Access accompanying presentation slides to reinforce key concepts and follow along with technical demonstrations throughout this 81-minute academic session.
Syllabus
DPO & Reasoning LMs
Taught by
UofU Data Science