Exploiting Low-Dimensional Data Structures and Understanding Neural Scaling Laws of Transformers
Institute for Pure & Applied Mathematics (IPAM) via YouTube
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the theoretical foundations behind transformer scaling laws in this 41-minute conference presentation from IPAM's Scientific Machine Learning Workshop. Discover how low-dimensional data structures in language datasets can explain why transformer-based large language models exhibit predictable power scaling laws dependent on model size and data size. Learn about the intrinsic dimension estimation of language datasets and examine statistical estimation and mathematical approximation theories for transformers that predict these scaling phenomena. Understand how exploiting low-dimensional data structures provides insights into transformer behavior that respects data geometry, and review empirical validation through trained language models that demonstrate strong agreement between observed scaling laws and theoretical predictions.
Syllabus
Wenjing Liao - Exploiting Low-Dimensional Data Structure & Understanding Neural Scaling of Trans...
Taught by
Institute for Pure & Applied Mathematics (IPAM)