Learn Backend Development Part-Time, Online
PowerBI Data Analyst - Create visualizations and dashboards from scratch
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
This lecture by Azalia Mirhoseini from Stanford/DeepMind explores inference compute as an emerging frontier for scaling Large Language Models (LLMs). Discover how "Large Language Monkeys" research demonstrates a predictable log-linear relationship between coverage (problems solved) and the number of inference samples across four orders of magnitude, suggesting the existence of inference-time scaling laws. Learn how these coverage increases translate to improved performance in domains with automatic verification like coding and formal proofs, while identifying correct samples without verifiers remains challenging. Explore the Archon framework, which automatically designs effective inference-time systems by selecting, combining, and stacking operations like repeated sampling, fusion, ranking, and verification to optimize LLM performance across diverse tasks. The talk concludes with hardware acceleration techniques to improve computational efficiency in LLM serving.
Syllabus
Inference Scaling: A New Frontier for AI Capability
Taught by
Simons Institute