Why Does Adam Work So Well for LLMs? And Can We Find Optimal Per-Variable Step Sizes

Why Does Adam Work So Well for LLMs? And Can We Find Optimal Per-Variable Step Sizes

NYU Tandon School of Engineering via YouTube Direct link

ECE AI SEMINAR: Why does Adam work so well for LLMs? And can we find optimal per-variable step sizes

1 of 1

1 of 1

ECE AI SEMINAR: Why does Adam work so well for LLMs? And can we find optimal per-variable step sizes

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Why Does Adam Work So Well for LLMs? And Can We Find Optimal Per-Variable Step Sizes

Automatically move to the next video in the Classroom when playback concludes

  1. 1 ECE AI SEMINAR: Why does Adam work so well for LLMs? And can we find optimal per-variable step sizes

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.