Beyond Preferences in AI Alignment: Towards Richer Models of Human Reasons and Decisions

This lecture by Tan Zhi Xuan from MIT challenges the dominant assumptions in AI alignment regarding human preferences. Explore a critical examination of whether preferences adequately represent human values, if human rationality can be reduced to preference maximization, and if AI systems should simply align with human preferences. Learn why preference judgments should be considered just one data source about human goals, values, and norms rather than the foundation for AI alignment. Discover an alternative approach to building AI assistants that can infer human principals' goals and normative standards by modeling how humans make decisions based on these reasons. See how such systems could adapt to context-specific goals while adhering to meta-norms for safe and helpful assistance. Part of the "Alignment, Trust, Watermarking, and Copyright Issues in LLMs" series at the Simons Institute.