The Mean-Field Dynamics of Transformers
Centre International de Rencontres Mathématiques via YouTube
Learn Backend Development Part-Time, Online
Learn the Skills Netflix, Meta, and Capital One Actually Hire For
Overview
Google, IBM & Meta Certificates – 40% Off
One plan covers every Professional Certificate on Coursera.
Unlock All Certificates
Explore a mathematical framework that interprets Transformer attention mechanisms as interacting particle systems through this 51-minute conference talk. Delve into the continuum (mean-field) limits of these systems by examining attention dynamics on the sphere and their connections to Wasserstein gradient flows, synchronization models like Kuramoto, and mean-shift clustering. Discover the central finding of a global clustering phenomenon where tokens asymptotically cluster after experiencing long metastable states with multiple cluster arrangements. Learn about the tractable equiangular reduction that enables exact clustering rate calculations, understand how normalization schemes affect contraction speeds, and examine the phase transition occurring in long-context attention scenarios. Gain insights into the mechanisms driving representation collapse and identify regimes that maintain expressive, multi-cluster structures in deep attention architectures, providing crucial understanding for the mathematical foundations underlying modern transformer models.
Syllabus
Philippe Rigollet: The mean-field dynamics of transformers
Taught by
Centre International de Rencontres Mathématiques