Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

DPO and Reasoning Language Models

UofU Data Science via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about Direct Preference Optimization (DPO) and reasoning language models in this comprehensive lecture from the University of Utah Data Science program. Explore the theoretical foundations and practical applications of DPO as an alternative to reinforcement learning from human feedback (RLHF) for aligning language models with human preferences. Discover how reasoning capabilities can be enhanced in large language models through various training methodologies and architectural considerations. Examine case studies and implementation strategies for developing more reliable and interpretable AI systems that can perform complex reasoning tasks. Access accompanying presentation slides to reinforce key concepts and follow along with technical demonstrations throughout this 81-minute academic session.

Syllabus

DPO & Reasoning LMs

Taught by

UofU Data Science

Reviews

Start your review of DPO and Reasoning Language Models

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.