Utah CS 6966 Interpretability of LLMs - SAE Advances Part 2 - Spring 2026

Explore advanced techniques in Sparse Autoencoders (SAE) for interpreting Large Language Models in this 75-minute university lecture from the University of Utah's CS 6966 course on LLM Interpretability. Delve into cutting-edge developments in SAE methodology that enhance our understanding of how large language models process and represent information internally. Build upon foundational SAE concepts to examine sophisticated approaches for decomposing neural network activations into interpretable components. Learn about recent research breakthroughs in sparse coding techniques, improved training methodologies, and novel applications of SAEs in mechanistic interpretability. Gain insights into how these advanced SAE methods contribute to making black-box language models more transparent and explainable, essential for developing trustworthy AI systems.