Muon Optimizer for Dense Linear Layer Explained - Newton-Schulz + Momentum

Muon Optimizer for Dense Linear Layer Explained - Newton-Schulz + Momentum

Yacine Mahdid via YouTube Direct link

- deep dive in newton schulz: 16:52

10 of 11

10 of 11

- deep dive in newton schulz: 16:52

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Muon Optimizer for Dense Linear Layer Explained - Newton-Schulz + Momentum

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - introduction: 0:00
  2. 2 - why muon is useful?: 2:04
  3. 3 - adam overview: 3:30
  4. 4 - adamw overview: 4:32
  5. 5 - what muon is doing?: 7:31
  6. 6 - muon authors overview: 8:26
  7. 7 - muon results: 10:39
  8. 8 - kimi k2 performance with muon-clip: 12:29
  9. 9 - what does muon do?: 13:54
  10. 10 - deep dive in newton schulz: 16:52
  11. 11 - coding muon in numpy: 27:59

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.