Get 20% off all career paths from fullstack to AI
Live Online Classes in Design, Coding & AI — Small Classes, Free Retakes
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore the theoretical puzzle of generalization in deep networks through this comprehensive lecture by Tomaso Poggio from MIT. Delve into key concepts such as how deep networks can overcome the curse of dimensionality for compositional functions, minimizing classification errors and surrogate functions, and the motivation behind generalization bounds for regression. Examine gradient descent as an unconstrained optimization gradient dynamical system, using examples like Lagrange multipliers. Discover how explicit norm constraints lead to weight normalization and why overparameterized networks can fit data while still generalizing well. Gain insights into gradient descent specifically for deep ReLU networks, enhancing your understanding of the theoretical foundations underlying deep learning.
Syllabus
Intro
Deep Networks can avoid the curse of dimensionality for compositional functions
Minimize classification error minimize surrogate function
Motivation: generalization bounds for regression
GD unconstrained optimization gradient dynamical system
Example: Lagrange multiplier
Explicit norm constraint gives weight normalization
Overparametrized networks fit the data and generalize
Gradient Descent for deep RELU networks
Taught by
MITCBMM