Power BI Fundamentals - Create visualizations and dashboards from scratch
Become an AI & ML Engineer with Cal Poly EPaCE — IBM-Certified Training
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore the inner workings of mixture of expert (MoE) models through an in-depth analysis of GPT-OSS-20B, OpenAI's first open weight model since GPT-2. Challenge common misconceptions about how MoE models operate by examining whether these systems actually contain specialized domain experts for mathematics, coding, or language tasks. Discover through empirical investigation that the reality of expert specialization differs significantly from popular assumptions. Analyze the unique architecture of this transformer-based MoE model and learn how it processes information differently than expected. Investigate token routing mechanisms and uncover patterns using trigram analysis techniques. Examine attention mechanisms and their role in model behavior, while distinguishing between position specialists and context specialists within the expert framework. Access accompanying research materials and code implementations to deepen your understanding of these advanced language model architectures and their surprising operational characteristics.
Syllabus
- intro
- Dense vs MoE models
- Not Domain Experts
- Disproving Token Routing
- Identifying patterns with TriGrams
- Attention is all you need
- Position Specialists vs Context Specialists
- Conclusion
Taught by
Chris Hay