Completed
kv cache strategies for recursive models
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Training Recursive Models - A Frontier in Adaptive Compute
Automatically move to the next video in the Classroom when playback concludes
- 1 Recursive Nanochat - a comparison with Karpathy’s 500M parameter model
- 2 Benchmark Results on Recursive Nanochat
- 3 What are the benefits of recursive models?
- 4 Recursive Models allow inference on smaller devices and fewer GPUs
- 5 Recursive Models open a pathway to adaptive compute
- 6 Recursive vs Non-recursive Architecture
- 7 How to handle the recursive stream via an adapter
- 8 Training for adaptive compute / recursions - Poisson log-normal recursion sampling
- 9 Handling torch.compile with recursive models
- 10 Implementing adaptive compute stopping recursions early
- 11 kv cache strategies for recursive models
- 12 Inference engine vLLM implications for recursive models
- 13 Training dynamics of recursive models Wandb overview, incl. flops utilisation
- 14 Code Review of Trelis/nanochat
- 15 Truncated backpropagation through time
- 16 Recursive loop adapter initialisation
- 17 Dynamic torch compile
- 18 Wrap up