MiniMax-01 Theory Overview - Lightning Attention + MoE + FlashAttention Optimization
Yacine Mahdid via YouTube
Overview
Syllabus
- Introduction: 0:00
- Model Overview: 3:04
- Main Result Overview: 8:14
- Background Information on Linear Attention: 11:00
- Lightning Attention Overview: 16:07
- I/O Optimization: 22:20
- Pre-training recipe: 25:10
- Post-training recipe: 26:31
- Full Results: 30:42
- Vision Modality for MiniMax-VL-01: 37:24
- Demo of MiniMax-text-01: 41:20
- Final Words: 45:04
Taught by
Yacine Mahdid