Training Recursive Models - A Frontier in Adaptive Compute

Training Recursive Models - A Frontier in Adaptive Compute

Trelis Research via YouTube Direct link

kv cache strategies for recursive models

11 of 18

11 of 18

kv cache strategies for recursive models

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Training Recursive Models - A Frontier in Adaptive Compute

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Recursive Nanochat - a comparison with Karpathy’s 500M parameter model
  2. 2 Benchmark Results on Recursive Nanochat
  3. 3 What are the benefits of recursive models?
  4. 4 Recursive Models allow inference on smaller devices and fewer GPUs
  5. 5 Recursive Models open a pathway to adaptive compute
  6. 6 Recursive vs Non-recursive Architecture
  7. 7 How to handle the recursive stream via an adapter
  8. 8 Training for adaptive compute / recursions - Poisson log-normal recursion sampling
  9. 9 Handling torch.compile with recursive models
  10. 10 Implementing adaptive compute stopping recursions early
  11. 11 kv cache strategies for recursive models
  12. 12 Inference engine vLLM implications for recursive models
  13. 13 Training dynamics of recursive models Wandb overview, incl. flops utilisation
  14. 14 Code Review of Trelis/nanochat
  15. 15 Truncated backpropagation through time
  16. 16 Recursive loop adapter initialisation
  17. 17 Dynamic torch compile
  18. 18 Wrap up

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.