One of Three Theoretical Puzzles - Generalization in Deep Networks

Explore the theoretical puzzle of generalization in deep networks through this comprehensive lecture by Tomaso Poggio from MIT. Delve into key concepts such as how deep networks can overcome the curse of dimensionality for compositional functions, minimizing classification errors and surrogate functions, and the motivation behind generalization bounds for regression. Examine gradient descent as an unconstrained optimization gradient dynamical system, using examples like Lagrange multipliers. Discover how explicit norm constraints lead to weight normalization and why overparameterized networks can fit data while still generalizing well. Gain insights into gradient descent specifically for deep ReLU networks, enhancing your understanding of the theoretical foundations underlying deep learning.

Syllabus

Intro
Deep Networks can avoid the curse of dimensionality for compositional functions
Minimize classification error minimize surrogate function
Motivation: generalization bounds for regression
GD unconstrained optimization gradient dynamical system
Example: Lagrange multiplier
Explicit norm constraint gives weight normalization
Overparametrized networks fit the data and generalize
Gradient Descent for deep RELU networks