Exploiting Low-Dimensional Data Structures and Understanding Neural Scaling Laws of Transformers

Explore the theoretical foundations behind transformer scaling laws in this 41-minute conference presentation from IPAM's Scientific Machine Learning Workshop. Discover how low-dimensional data structures in language datasets can explain why transformer-based large language models exhibit predictable power scaling laws dependent on model size and data size. Learn about the intrinsic dimension estimation of language datasets and examine statistical estimation and mathematical approximation theories for transformers that predict these scaling phenomena. Understand how exploiting low-dimensional data structures provides insights into transformer behavior that respects data geometry, and review empirical validation through trained language models that demonstrate strong agreement between observed scaling laws and theoretical predictions.