Theory of Scaling: A General Framework for Scale-aware Training

In this one-hour lecture, Soufiane Hayou from UC Berkeley presents a theoretical framework for efficient learning at large scale in deep neural networks. Explore how to derive efficient learning rules that automatically adjust to model scale, ensuring stability and optimal performance when working with neural architectures containing billions of parameters. Learn about the fundamental principles governing neural networks as Hayou addresses the significant challenge of optimizing training processes across different scales, moving beyond the common practice of following extrapolated scaling rules. Gain practical guidelines for training neural networks efficiently and understand new insights into how scale-aware training can boost model performance when implemented correctly.