Enhancing Reasoning of Large Language Models through Reward-Guided Search and Self-Training

Explore cutting-edge techniques for enhancing the reasoning capabilities of Large Language Models (LLMs) in this keynote presentation from the Large Language Model Day at KDD2024. Delve into innovative approaches that leverage inference-time compute for self-improvement, including thought space search and self-training methodologies. Discover the development of generalizable, fine-grained reward models using tree search to automatically collect per-step values of reasoning trace correctness. Learn about ReST-MCTS, a process reward-guided tree search algorithm that enables continuous training of policy and reward models without manual annotation. Examine the application of these techniques in strategic game-playing, vision-language modeling, and 3D scene generation. Gain insights into how these advancements contribute to improving the capabilities of state-of-the-art language models like grok-2. Explore future directions for scaling up self-training and applying online reinforcement learning to unlock even greater intrinsic improvements in LLM capabilities.