Between the Layers - Interpreting Large Language Models

Explore the critical field of AI interpretability in this 59-minute conference talk that delves into understanding how Large Language Models actually process and represent information. Examine the latest research from Anthropic's 2024 work on mechanistic interpretability and monosemanticity scaling, while learning to distinguish between explainability and interpretability and their respective applications. Discover why traditional methodologies that aimed at explaining models fall short for today's LLMs and understand the current toolkit available for interpreting black-box models. Debunk common myths about LLMs, including the tendency to anthropomorphize them, and grasp why interpretable AI has become essential for reliability and trust in modern AI systems. Gain practical strategies for applying interpretability techniques to real-world AI applications, whether you're integrating LLMs into software or advancing AI research, and learn how interpretability helps debug unexpected model behavior, improve efficiency, and ensure AI systems remain aligned with human goals.