Iterative Preference Learning Methods for Large Language Model Post-Training

Explore the intricacies of Reinforcement Learning from Human Feedback (RLHF) in this 53-minute talk by Wei Xiong from UIUC, presented at the Simons Institute. Delve into the mathematical foundations of RLHF, examining its formulation as a reverse-KL regularized contextual bandit problem and its statistical efficiency. Discover how continuous online exploration through human evaluator interactions enhances RLHF's effectiveness. Learn about a novel, provably efficient online iterative training framework that spawns innovative RLHF algorithms like iterative direct preference learning. Gain practical insights into creating state-of-the-art chatbots using open-source data, as demonstrated in the RLHFlow project. This talk, part of the "Emerging Generalization Settings" series, offers a deep dive into the cutting-edge techniques aligning large language models with human preferences.