Explore the cutting-edge frontier of recursive neural network models in this comprehensive 38-minute technical tutorial that demonstrates how to implement and train adaptive compute systems. Learn to build recursive models that can dynamically adjust their computational depth during inference, enabling more efficient deployment on resource-constrained devices while maintaining performance quality. Discover the architectural differences between recursive and traditional non-recursive models, understanding how recursive streams are handled through specialized adapters and how Poisson log-normal recursion sampling enables effective training for adaptive compute scenarios. Master practical implementation techniques including truncated backpropagation through time, dynamic torch compilation strategies, and KV cache optimization methods specifically designed for recursive architectures. Examine real benchmark results comparing recursive models against standard implementations, analyze training dynamics through Wandb visualizations including FLOPS utilization metrics, and understand the implications for modern inference engines like vLLM. Gain hands-on experience through detailed code reviews of the nanochat implementation, covering recursive loop adapter initialization, early stopping mechanisms for adaptive compute, and strategies for deploying these models across different hardware configurations from single GPUs to distributed systems.

Syllabus

Recursive Nanochat - a comparison with Karpathy’s 500M parameter model
Benchmark Results on Recursive Nanochat
What are the benefits of recursive models?
Recursive Models allow inference on smaller devices and fewer GPUs
Recursive Models open a pathway to adaptive compute
Recursive vs Non-recursive Architecture
How to handle the recursive stream via an adapter
Training for adaptive compute / recursions - Poisson log-normal recursion sampling
Handling torch.compile with recursive models
Implementing adaptive compute stopping recursions early
kv cache strategies for recursive models
Inference engine vLLM implications for recursive models
Training dynamics of recursive models Wandb overview, incl. flops utilisation
Code Review of Trelis/nanochat
Truncated backpropagation through time
Recursive loop adapter initialisation
Dynamic torch compile
Wrap up