Exploiting Low-Dimensional Data Structures and Understanding Neural Scaling Laws of Transformers
Institute for Pure & Applied Mathematics (IPAM) via YouTube
UC San Diego Product Management Certificate — AI-Powered PM Training
Google, IBM & Microsoft Certificates — All in One Plan
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore the theoretical foundations behind transformer scaling laws in this 41-minute conference presentation from IPAM's Scientific Machine Learning Workshop. Discover how low-dimensional data structures in language datasets can explain why transformer-based large language models exhibit predictable power scaling laws dependent on model size and data size. Learn about the intrinsic dimension estimation of language datasets and examine statistical estimation and mathematical approximation theories for transformers that predict these scaling phenomena. Understand how exploiting low-dimensional data structures provides insights into transformer behavior that respects data geometry, and review empirical validation through trained language models that demonstrate strong agreement between observed scaling laws and theoretical predictions.
Syllabus
Wenjing Liao - Exploiting Low-Dimensional Data Structure & Understanding Neural Scaling of Trans...
Taught by
Institute for Pure & Applied Mathematics (IPAM)