Watch a 20-minute research presentation exploring the development of MobileLLM, a groundbreaking approach to deploying efficient large language models on mobile devices. Learn how deep and thin architectures, embedding sharing, and grouped-query attention mechanisms enable high-performance language models with fewer than a billion parameters. Discover how these optimizations achieve significant accuracy improvements over previous state-of-the-art models in commonsense reasoning tasks, with 2.7% and 4.3% boosts for 125M and 350M parameter models respectively. Understand how this architectural innovation challenges the conventional wisdom that data and parameter quantity are the primary drivers of model quality, while demonstrating comparable performance to much larger models in practical applications like API calling tasks.