Tackling Challenges in Scaling Test-Time Compute

Explore the emerging paradigm of test-time computation in reasoning models through this 31-minute conference talk that examines how models like OpenAI's o3 and DeepSeek's R1 are revolutionizing traditional LLM approaches. Learn about the fundamental concept of test-time computation and discover the various dimensions along which it can be expended and scaled effectively. Understand the key limitations facing current test-time scaling paradigms, including critical challenges in thought budgeting, context management, and confidence estimation. Gain insights into potential solutions for overcoming these obstacles and discover best practices for prompting reasoning models to maximize their effectiveness. The session provides a comprehensive overview of how reasoning models leverage test-time compute to solve complex tasks requiring advanced reasoning capabilities, marking a significant departure from conventional large language model architectures and challenging established assumptions in the field.