The Fastest Way to Become a Backend Developer Online
Finance Certifications Goldman Sachs & Amazon Teams Trust
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore the inner workings of mixture of expert (MoE) models through an in-depth analysis of GPT-OSS-20B, OpenAI's first open weight model since GPT-2. Challenge common misconceptions about how MoE models operate by examining whether these systems actually contain specialized domain experts for mathematics, coding, or language tasks. Discover through empirical investigation that the reality of expert specialization differs significantly from popular assumptions. Analyze the unique architecture of this transformer-based MoE model and learn how it processes information differently than expected. Investigate token routing mechanisms and uncover patterns using trigram analysis techniques. Examine attention mechanisms and their role in model behavior, while distinguishing between position specialists and context specialists within the expert framework. Access accompanying research materials and code implementations to deepen your understanding of these advanced language model architectures and their surprising operational characteristics.
Syllabus
- intro
- Dense vs MoE models
- Not Domain Experts
- Disproving Token Routing
- Identifying patterns with TriGrams
- Attention is all you need
- Position Specialists vs Context Specialists
- Conclusion
Taught by
Chris Hay