Overview
Syllabus
Recursive Nanochat - a comparison with Karpathy’s 500M parameter model
Benchmark Results on Recursive Nanochat
What are the benefits of recursive models?
Recursive Models allow inference on smaller devices and fewer GPUs
Recursive Models open a pathway to adaptive compute
Recursive vs Non-recursive Architecture
How to handle the recursive stream via an adapter
Training for adaptive compute / recursions - Poisson log-normal recursion sampling
Handling torch.compile with recursive models
Implementing adaptive compute stopping recursions early
kv cache strategies for recursive models
Inference engine vLLM implications for recursive models
Training dynamics of recursive models Wandb overview, incl. flops utilisation
Code Review of Trelis/nanochat
Truncated backpropagation through time
Recursive loop adapter initialisation
Dynamic torch compile
Wrap up
Taught by
Trelis Research