Attention Layers and Single-Location Regression - A Theoretical Analysis
Centre International de Rencontres Mathématiques via YouTube
The Most Addictive Python and SQL Courses
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Overview
Build a Learning Habit
Download Class Central's free printable study calendar
Download for Free
Watch a 45-minute conference talk exploring the theoretical foundations of attention-based models in machine learning, delivered at the Centre International de Rencontres Mathématiques in Marseille, France. Delve into the single-location regression task, where outputs depend on a single token within a sequence, with its position determined through linear projection. Learn about a simplified non-linear self-attention layer predictor that demonstrates asymptotic Bayes optimality and exhibits effective learning capabilities despite non-convex optimization challenges. Understand how attention mechanisms handle sparse token information and internal linear structures, contributing to the theoretical understanding of models like Transformer. Access this presentation through CIRM's Audiovisual Mathematics Library, featuring chapter markers, keywords, abstracts, bibliographies, and Mathematics Subject Classification for enhanced navigation and comprehension.
Syllabus
Claire Boyer: Attention layers provably solve single-location regression
Taught by
Centre International de Rencontres Mathématiques