Reinforcement Learning from Human Feedback - Progress and Challenges

Explore a comprehensive lecture from UC Berkeley's EECS Colloquium that delves into the progress and challenges of reinforcement learning systems trained using human feedback. Learn about key concepts including hallucination prevention, behavior cloning, model uncertainty, and factuality improvement in AI systems. Discover practical applications through discussions of retrieval mechanisms, source citation, RL environments, and browsing capabilities. Examine critical open problems in AI development such as scalable oversight, optimization for correctness, and creativity. The hour-long presentation also covers fascinating intersections with classical literature, philosophy, and methods for forecasting AI progress, providing valuable insights for researchers, developers, and AI enthusiasts interested in the future of human-AI interaction and system development.

Syllabus

Introduction
Overview
Hallucination
Conceptual Model
Behavior Cloning
Does the model know
Uncertainty
When should you hedge
Long form answers
Improving factuality
Challenges
Retrieval Citing Sources
RL Environment
RL Task
RL Pipeline
Browsing
Dagger
Open problems
Scalable oversight
Optimization for correctness
Creativity
Classical Literature Philosophy
AI Progress Forecasting