Understanding Optimization in Deep Learning with Central Flows

Explore a mathematical seminar presentation that introduces central flows as a theoretical framework for understanding optimization dynamics in deep learning. Discover how traditional optimization theories fall short in describing the complex, oscillatory behavior that occurs during neural network training, particularly in the "edge of stability" regime where optimizers typically operate. Learn about the key insight that while exact trajectories of oscillatory optimizers are difficult to analyze, their time-averaged or smoothed trajectories become much more tractable. Examine how central flows—differential equations that characterize these time-averaged trajectories—can predict long-term optimization paths for generic neural networks with remarkable numerical accuracy. Understand the mechanisms by which gradient descent makes progress even when loss values occasionally increase, how adaptive optimizers adjust to local loss landscapes, and how these optimizers implicitly guide training toward regions that allow for larger step sizes. Gain insights into this novel theoretical tool that bridges the gap between optimization theory and the practical realities of deep learning training dynamics.