Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a comprehensive technical breakdown of the leading open-source large language models in this 13-minute video analysis. Dive deep into the architectural differences between OpenAI's first open-weights model since GPT-2, DeepSeek's innovative approaches, and Alibaba's Qwen series to understand what makes each unique under the hood. Examine their distinct implementations of mixture-of-experts architectures, long-context training methodologies, and post-training techniques that influence reasoning capabilities and alignment. Learn about Qwen-3's training innovations and reinforcement learning advances, DeepSeek V3's Multi-Head Latent Attention (MLA) mechanism and recent V3.1 updates, and how different design philosophies in model sizing and context handling strategies lead to surprisingly comparable performance outcomes. Gain insights into the technical decisions that shape modern open-source AI development and discover key takeaways about the current state of accessible large language model architectures.
Syllabus
00:00 – OpenAI OSS Launch
01:00 – Comparing Open Source LLM Architectures
01:46 – GPT OSS Overview
02:37 – Under The Hood of GPT OSS
03:25 – Qwen-3 Architecture
04:17 – Qwen-3 Training
05:12 – Qwen-3 Post-Training
06:08 – Qwen-3 Reasoning & RL Innovations
06:52 – DeepSeek V3 Overview
07:40 – DeepSeek V3.1 Updates
08:39 – Attention Mechanism MLA
09:39 – Comparing Model Sizes
10:35 – Long Context Strategies
11:25 – Reflections on Methods
12:00 – Takeaways
Taught by
Y Combinator