Model Diffing with Crosscoders - CS 6966 Interpretability of LLMs Spring 2026

Explore advanced techniques for understanding and comparing large language models through crosscoder-based model diffing in this 44-minute university lecture from the University of Utah's CS 6966 course on LLM interpretability. Learn how to systematically analyze differences between language models using crosscoder architectures, which enable researchers to identify and understand variations in model behavior, representations, and decision-making processes. Discover methodologies for conducting meaningful comparisons between different LLM versions or architectures, and gain insights into how crosscoders can reveal hidden patterns and differences in model internals that traditional comparison methods might miss. Master practical approaches to model interpretability that are essential for understanding how changes in training, architecture, or data affect LLM performance and behavior.