MoE Token Routing Explained - How Mixture of Experts Works with Code

MoE Token Routing Explained - How Mixture of Experts Works with Code

Hugging Face via YouTube Direct link

Introduction:

1 of 14

1 of 14

Introduction:

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

MoE Token Routing Explained - How Mixture of Experts Works with Code

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Introduction:
  2. 2 Laying the Foundation for Mixture of Experts MoE:
  3. 3 Focus on Token Routing:
  4. 4 What is a Mixture of Experts Layer?:
  5. 5 Problem Statement and Configurations:
  6. 6 Compute Router Logits:
  7. 7 Sparsity and Selecting Top K Experts:
  8. 8 Normalizing Logits to Router Probabilities:
  9. 9 Slot Selection:
  10. 10 Dropping Oversubscribed Tokens:
  11. 11 Updated Normalized Token Weights:
  12. 12 Updated Slot Selection and Token Slots:
  13. 13 Final Weight Matrix Construction:
  14. 14 Conclusion:

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.