What is the Transformers' Context Window in Deep Learning and How to Make it Long
Yacine Mahdid via YouTube
Overview
Syllabus
- Introduction: 0:00
- Why more context is good: 0:33
- R1 longer context: 1:06
- A little retrieval test: 1:56
- Needle-in-a-haystack: 2:40
- Multi-Round Needle-in-a-haystack: 3:38
- Machine Translation from One Book MTOB: 4:52
- Attention Calculation Recap: 6:16
- How to encode positions: 8:51
- Issue with increasing context: 10:07
- How to extend context: 11:26
- Fixing positional encoding: 11:45
- Fixing Attention Calculation: 13:21
- Flash Attention: 13:55
- Sparse Attention: 14:52
- Low-Rank Decomposition: 18:14
- Chunking: 19:51
- Other type of strategy using linear components: 21:44
- LLama 4 changes: 24:12
- Google Long Context Team Nikolay Savinov: 25:33
- see you folks! : 26:50
Taught by
Yacine Mahdid