Interpretability of LLMs - Generating SAE Feature Descriptions - Spring 2026

Explore advanced techniques for generating Sparse Autoencoder (SAE) feature descriptions in this graduate-level computer science lecture from the University of Utah's CS 6966 course on Large Language Model interpretability. Delve into the methodologies and computational approaches used to automatically create meaningful descriptions of features learned by sparse autoencoders when applied to large language models. Learn how SAE feature description generation contributes to understanding the internal representations and mechanisms of LLMs, examining both the theoretical foundations and practical implementation challenges. Gain insights into how these descriptive techniques help researchers and practitioners interpret what specific neurons or feature combinations represent within complex neural language models, advancing the broader field of AI interpretability and explainability.