Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Scaling Laws for Gradient Descent and Sign Descent for Linear Bigram Models under Zipf's Law

Centre de recherches mathématiques - CRM via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore optimization challenges in transformer-based language models through this mathematical research lecture that examines scaling laws for gradient descent and sign descent algorithms applied to linear bigram models under Zipf's law. Delve into the theoretical foundations of why gradient descent struggles with the first and last layers of language models, particularly when dealing with heavy-tailed word distributions where frequency follows the 1/k pattern characteristic of natural language text. Learn how the power law distribution of tokens, parameterized by exponent α, affects training performance and discover why the case α=1 found in real text data represents a "worst-case" scenario for gradient descent optimization. Understand the mathematical derivation of scaling laws that show gradient descent requires iterations scaling almost linearly with dimension for Zipf-distributed data, while sign descent (as a proxy for Adam optimizer) achieves significantly better performance with iterations scaling only with the square-root of dimension. Gain insights into the theoretical underpinnings of why optimizers like Adam outperform gradient descent in natural language processing tasks, moving beyond the typical assumption that eigenvalues decay with exponent α > 1 to examine the more challenging heavy-tailed distributions encountered in practice.

Syllabus

Francis Bach: Scaling Laws for Gradient Descent & Sign Descent for Linear Bigram Models under Zipf's

Taught by

Centre de recherches mathématiques - CRM

Reviews

Start your review of Scaling Laws for Gradient Descent and Sign Descent for Linear Bigram Models under Zipf's Law

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.