Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
This video explores the technical details of the Qwen3 model, examining how it was built with a dual-mode thinking capability. Learn about the specific pre-training phases and post-training elements that enabled Qwen3 to dynamically switch between thinking and not-thinking modes. Discover the Strong-to-weak Distillation process used to create smaller Qwen3 models ranging from 0.6B to 235B parameters. The 19-minute explanation breaks down the technical paper published by the Qwen Team, providing insights into the architecture and methodology behind this advanced AI system.