Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore advanced techniques for generating Sparse Autoencoder (SAE) feature descriptions in this graduate-level computer science lecture from the University of Utah's CS 6966 course on Large Language Model interpretability. Delve into the methodologies and computational approaches used to automatically create meaningful descriptions of features learned by sparse autoencoders when applied to large language models. Learn how SAE feature description generation contributes to understanding the internal representations and mechanisms of LLMs, examining both the theoretical foundations and practical implementation challenges. Gain insights into how these descriptive techniques help researchers and practitioners interpret what specific neurons or feature combinations represent within complex neural language models, advancing the broader field of AI interpretability and explainability.
Syllabus
UUtah CS 6966 Interpretability of LLMs | Spring 2026 | Generating SAE feature descriptions
Taught by
UofU Data Science