Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

The Mean-Field Dynamics of Transformers

Centre International de Rencontres Mathématiques via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a mathematical framework that interprets Transformer attention mechanisms as interacting particle systems through this 51-minute conference talk. Delve into the continuum (mean-field) limits of these systems by examining attention dynamics on the sphere and their connections to Wasserstein gradient flows, synchronization models like Kuramoto, and mean-shift clustering. Discover the central finding of a global clustering phenomenon where tokens asymptotically cluster after experiencing long metastable states with multiple cluster arrangements. Learn about the tractable equiangular reduction that enables exact clustering rate calculations, understand how normalization schemes affect contraction speeds, and examine the phase transition occurring in long-context attention scenarios. Gain insights into the mechanisms driving representation collapse and identify regimes that maintain expressive, multi-cluster structures in deep attention architectures, providing crucial understanding for the mathematical foundations underlying modern transformer models.

Syllabus

Philippe Rigollet: The mean-field dynamics of transformers

Taught by

Centre International de Rencontres Mathématiques

Reviews

Start your review of The Mean-Field Dynamics of Transformers

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.