The Mean-Field Dynamics of Transformers
Centre International de Rencontres Mathématiques via YouTube
Stuck in Tutorial Hell? Learn Backend Dev the Right Way
Google, IBM & Microsoft Certificates — All in One Plan
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore a mathematical framework that interprets Transformer attention mechanisms as interacting particle systems through this 51-minute conference talk. Delve into the continuum (mean-field) limits of these systems by examining attention dynamics on the sphere and their connections to Wasserstein gradient flows, synchronization models like Kuramoto, and mean-shift clustering. Discover the central finding of a global clustering phenomenon where tokens asymptotically cluster after experiencing long metastable states with multiple cluster arrangements. Learn about the tractable equiangular reduction that enables exact clustering rate calculations, understand how normalization schemes affect contraction speeds, and examine the phase transition occurring in long-context attention scenarios. Gain insights into the mechanisms driving representation collapse and identify regimes that maintain expressive, multi-cluster structures in deep attention architectures, providing crucial understanding for the mathematical foundations underlying modern transformer models.
Syllabus
Philippe Rigollet: The mean-field dynamics of transformers
Taught by
Centre International de Rencontres Mathématiques