Utah CS 6966 Interpretability of LLMs - Probing Part 2 - Spring 2026

Explore advanced probing techniques for understanding the internal representations and mechanisms of large language models in this 33-minute graduate-level lecture from the University of Utah's CS 6966 course on LLM interpretability. Delve deeper into methodologies for investigating what neural networks learn and how they process information, building upon foundational probing concepts to examine more sophisticated approaches for analyzing model behavior. Learn to apply computational techniques that reveal hidden patterns in language model architectures, understand how different layers encode linguistic information, and discover methods for extracting interpretable insights from complex neural representations. Gain practical knowledge of experimental design for probing studies, evaluation metrics for interpretability research, and current challenges in making black-box language models more transparent and explainable.