Completed
- introduction: 0:00
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Muon Optimizer for Dense Linear Layer Explained - Newton-Schulz + Momentum
Automatically move to the next video in the Classroom when playback concludes
- 1 - introduction: 0:00
- 2 - why muon is useful?: 2:04
- 3 - adam overview: 3:30
- 4 - adamw overview: 4:32
- 5 - what muon is doing?: 7:31
- 6 - muon authors overview: 8:26
- 7 - muon results: 10:39
- 8 - kimi k2 performance with muon-clip: 12:29
- 9 - what does muon do?: 13:54
- 10 - deep dive in newton schulz: 16:52
- 11 - coding muon in numpy: 27:59