Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

MoE Models Don't Work Like You Think - Inside GPT-OSS

Chris Hay via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the inner workings of mixture of expert (MoE) models through an in-depth analysis of GPT-OSS-20B, OpenAI's first open weight model since GPT-2. Challenge common misconceptions about how MoE models operate by examining whether these systems actually contain specialized domain experts for mathematics, coding, or language tasks. Discover through empirical investigation that the reality of expert specialization differs significantly from popular assumptions. Analyze the unique architecture of this transformer-based MoE model and learn how it processes information differently than expected. Investigate token routing mechanisms and uncover patterns using trigram analysis techniques. Examine attention mechanisms and their role in model behavior, while distinguishing between position specialists and context specialists within the expert framework. Access accompanying research materials and code implementations to deepen your understanding of these advanced language model architectures and their surprising operational characteristics.

Syllabus

- intro
- Dense vs MoE models
- Not Domain Experts
- Disproving Token Routing
- Identifying patterns with TriGrams
- Attention is all you need
- Position Specialists vs Context Specialists
- Conclusion

Taught by

Chris Hay

Reviews

Start your review of MoE Models Don't Work Like You Think - Inside GPT-OSS

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.