Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Interpretability of LLMs - SAE Use Cases and Training Advances

UofU Data Science via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore advanced applications and training methodologies for Sparse Autoencoders (SAEs) in this university lecture from Utah's CS 6966 course on Large Language Model interpretability. Delve into cutting-edge use cases where SAEs enhance our understanding of neural network internal representations, examining how these techniques reveal interpretable features within complex language models. Learn about recent advances in SAE training procedures, including optimization strategies, architectural improvements, and scaling considerations that make these interpretability tools more effective and practical. Discover how SAEs can be applied to analyze different layers and components of transformer models, providing insights into how LLMs process and represent information. Examine case studies demonstrating successful SAE implementations across various interpretability research scenarios, from feature visualization to mechanistic understanding of model behavior. Gain practical knowledge about the technical challenges involved in training robust SAEs, including handling sparse activation patterns, managing computational costs, and ensuring meaningful feature extraction from high-dimensional neural representations.

Syllabus

UUtah CS 6966 Interpretability of LLMs | Spring 2026 | SAE use cases & training advances

Taught by

UofU Data Science

Reviews

Start your review of Interpretability of LLMs - SAE Use Cases and Training Advances

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.