Memory in LLMs - Weights and Activations

Explore the fundamental mechanisms of memory storage and retrieval in large language models through an in-depth workshop examining the distinction between weights and activations as memory systems. Delve into the critical "long-tail" knowledge problem where LLMs excel at general knowledge but fail catastrophically at niche, specialized tasks that fall outside their training data or knowledge cutoffs. Analyze three paradigms for knowledge injection: full context, Retrieval Augmented Generation (RAG), and the proposed solution of training knowledge directly into model weights. Understand the severe limitations of context-based approaches, including the quadratic complexity of self-attention mechanisms that create prohibitive cost and latency issues - demonstrated through benchmarks showing performance drops from 10,000 tokens per second with 1,000 tokens of context to just 130 tokens per second with 128k tokens. Examine the concept of "context rot" where reasoning capabilities degrade as context length increases, even when models don't break entirely. Learn about the distinction between activations as expensive short-term working memory versus weights as efficient long-term storage mechanisms for static, specialized knowledge. Discover why treating weights as a memory storage system offers a more efficient alternative to repeatedly feeding context during inference cycles. Engage with forward-looking discussions on the potential return of federated learning for specialized knowledge updates, the debate between specialized models versus general reasoning engines that rely entirely on external tools, and the implications for temporal information handling in future LLM architectures.

Syllabus

The Knowledge Cutoff & Long-Tail Problem
Three Methods for Knowledge Injection Context, RAG, Weights
Limitations of "Full Context" Cost & Latency
The Transformer Bottleneck: Self-Attention Complexity
Context Rot: Performance degradation in long context
Q&A: The Return of Federated Learning
Q&A: Specialized Knowledge Models vs. Karpathy’s "Reasoning Engines"
Q&A: Temporal Information & Future Research