SGD Exact Dynamics in High-Dimension - Insights for Algorithm Design

Explore the theoretical foundations of stochastic gradient descent (SGD) in high-dimensional settings through this machine learning lecture that reveals how SGD dynamics converge to low-dimensional ordinary differential equations. Discover a unified framework for analyzing SGD behavior across generalized linear models and multi-index problems trained on Gaussian data with general covariance, encompassing important models like logistic regression, phase retrieval, and two-layer neural networks. Learn how this theoretical approach provides insights into the surprising practical effectiveness of stochastic optimization methods that are central to modern machine learning. Examine two key applications of this framework: first, understand how data anisotropy influences the behavior and performance of stochastic adaptive methods including line search and AdaGrad-Norm, and second, analyze differentially private SGD with gradient clipping to see how this framework yields improved risk-estimation error rates in challenging aggressive clipping scenarios. Gain insights from research that bridges machine learning, statistical physics, and high-dimensional probability to better understand algorithm design principles for stochastic optimization in contemporary machine learning applications.