Linear Representation Hypothesis - Utah CS 6966 Interpretability of LLMs Spring 2026

Explore the linear representation hypothesis in large language models through this 37-minute university lecture from Utah's CS 6966 course on LLM interpretability. Delve into how neural networks encode and process information using linear representations, examining the theoretical foundations and practical implications for understanding model behavior. Learn about the mathematical frameworks that describe how concepts and features are represented in high-dimensional vector spaces within transformer architectures. Investigate current research methodologies for testing and validating the linear representation hypothesis, including techniques for probing internal model states and analyzing activation patterns. Discover how this hypothesis relates to broader interpretability challenges in modern language models and its significance for developing more transparent AI systems. Access supplementary course notes to reinforce key concepts and deepen your understanding of this fundamental aspect of neural network interpretability.