AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off your first 3 months — limited time.
Unlock All Certificates
This video explores the technical details of the Qwen3 model, examining how it was built with a dual-mode thinking capability. Learn about the specific pre-training phases and post-training elements that enabled Qwen3 to dynamically switch between thinking and not-thinking modes. Discover the Strong-to-weak Distillation process used to create smaller Qwen3 models ranging from 0.6B to 235B parameters. The 19-minute explanation breaks down the technical paper published by the Qwen Team, providing insights into the architecture and methodology behind this advanced AI system.