Understanding Reasoning LLMs - o1, DeepSeek-R1, Gemini Thinking, Grok 3, Claude 3.7

This video explores the recent wave of reasoning-focused large language models, including OpenAI's o1, Google's Gemini Thinking, DeepSeek's R1, xAI's Grok 3, and Anthropic's Claude 3.7. Dive into what reasoning models actually are, how they're trained, and why they represent an evolution rather than a revolution in AI capabilities. Learn about the four main approaches to building reasoning LLMs: inference-time scaling, pure reinforcement learning, supervised fine-tuning combined with reinforcement learning, and distillation. Discover why these models, despite their improved performance on structured problems, still operate as next-token predictors using transformer architecture like other LLMs. The presentation includes detailed explanations of training pipelines, with special focus on DeepSeek's R1 implementation, and concludes with an examination of the current limitations and challenges facing reasoning LLMs. A downloadable canvas/mindmap is provided to help visualize these concepts.

Syllabus

00:00 - Introduction
02:42 - What are reasoning models?
03:56 - The four approaches to building "reasoning" LLMs
04:31 - Inference-time scaling
06:46 - Standard LLM training pipeline
08:26 - Pure Reinforcement Learning DeepSeek R1-Zero
12:21 - Supervised Fine Tuning + Reinforcement Learning DeepSeek R1
17:20 - Summary of STF+RF approach DeepSeek R1
18:18 - Distillation
21:55 - Limitations and challenges of reasoning LLMs