The Mean-Field Dynamics of Transformers
Centre International de Rencontres Mathématiques via YouTube
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a mathematical framework that interprets Transformer attention mechanisms as interacting particle systems through this 51-minute conference talk. Delve into the continuum (mean-field) limits of these systems by examining attention dynamics on the sphere and their connections to Wasserstein gradient flows, synchronization models like Kuramoto, and mean-shift clustering. Discover the central finding of a global clustering phenomenon where tokens asymptotically cluster after experiencing long metastable states with multiple cluster arrangements. Learn about the tractable equiangular reduction that enables exact clustering rate calculations, understand how normalization schemes affect contraction speeds, and examine the phase transition occurring in long-context attention scenarios. Gain insights into the mechanisms driving representation collapse and identify regimes that maintain expressive, multi-cluster structures in deep attention architectures, providing crucial understanding for the mathematical foundations underlying modern transformer models.
Syllabus
Philippe Rigollet: The mean-field dynamics of transformers
Taught by
Centre International de Rencontres Mathématiques