A Framework for Designing Non-Diagonal Adaptive Training Methods
Institute for Pure & Applied Mathematics (IPAM) via YouTube
Save 40% on 12 months of Coursera Plus
PowerBI Data Analyst - Create visualizations and dashboards from scratch
Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore a 49-minute conference talk presented by Wu Lin from the Vector Institute at IPAM's Theory and Practice of Deep Learning Workshop. Delve into a framework for designing non-diagonal adaptive training methods in deep learning optimization. Discover how probabilistic reformulation of optimization problems can exploit the Fisher-Rao geometric structure of probability families. Learn about new quasi-Newton methods for large-scale neural network training that leverage geometric structures. Examine the second-order perspective on adaptive methods like RMSProp and full-matrix AdaGrad. Understand the concept of preconditioner invariance and its application in making non-diagonal adaptive methods inverse-free while maintaining preconditioner structures for modern mini-batch training with low precision. Investigate Kronecker-factored adaptive methods as a bridge between non-diagonal and diagonal adaptive methods. Gain insights into the advantages of these methods for training large neural networks in half-precision, eliminating numerically unstable and computationally intensive matrix decompositions and inversions.
Syllabus
Wu Lin - A framework for designing (non-diagonal) adaptive training methods - IPAM at UCLA
Taught by
Institute for Pure & Applied Mathematics (IPAM)