Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Beyond Preferences in AI Alignment: Towards Richer Models of Human Reasons and Decisions

Simons Institute via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
This lecture by Tan Zhi Xuan from MIT challenges the dominant assumptions in AI alignment regarding human preferences. Explore a critical examination of whether preferences adequately represent human values, if human rationality can be reduced to preference maximization, and if AI systems should simply align with human preferences. Learn why preference judgments should be considered just one data source about human goals, values, and norms rather than the foundation for AI alignment. Discover an alternative approach to building AI assistants that can infer human principals' goals and normative standards by modeling how humans make decisions based on these reasons. See how such systems could adapt to context-specific goals while adhering to meta-norms for safe and helpful assistance. Part of the "Alignment, Trust, Watermarking, and Copyright Issues in LLMs" series at the Simons Institute.

Syllabus

Beyond Preferences in AI Alignment: Towards Richer Models of Human Reasons and Decisions

Taught by

Simons Institute

Reviews

Start your review of Beyond Preferences in AI Alignment: Towards Richer Models of Human Reasons and Decisions

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.