How AI Videos Actually Work - Diffusion Models, CLIP, and the Math of Turning Text into Images

How AI Videos Actually Work - Diffusion Models, CLIP, and the Math of Turning Text into Images

3Blue1Brown via YouTube Direct link

0:00 - Intro

1 of 16

1 of 16

0:00 - Intro

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

How AI Videos Actually Work - Diffusion Models, CLIP, and the Math of Turning Text into Images

Automatically move to the next video in the Classroom when playback concludes

  1. 1 0:00 - Intro
  2. 2 3:37 - CLIP
  3. 3 6:25 - Shared Embedding Space
  4. 4 8:16 - Diffusion Models & DDPM
  5. 5 11:44 - Learning Vector Fields
  6. 6 22:00 - DDIM
  7. 7 25:25 Dall E 2
  8. 8 26:37 - Conditioning
  9. 9 30:02 - Guidance
  10. 10 33:39 - Negative Prompts
  11. 11 34:27 - Outro
  12. 12 35:32 - About guest videos + Grant’s Reaction
  13. 13 6:15 CLIP: Although directly minimizing cosine similarity would push our vectors 180 degrees apart on a single batch, overall in practice, we need CLIP to maximize the uniformity of concepts over the…
  14. 14 Per Chenyang Yuan: at 10:15, the blurry image that results when removing random noise in DDPM is probably due to a mismatch in noise levels when calling the denoiser. When the denoiser is called on x…
  15. 15 For the vectors at 31:40 - Some implementations use fx, t, cat + alphafx, t, cat - fx, t, and some that do fx, t + alphafx, t, cat - fx, t, where an alpha value of 1 corresponds to no guidance. I cho…
  16. 16 At 30:30, the unconditional t=1 vector field looks a bit different from what it did at the 17:15 mark. This is the result of different models trained for different parts of the video, and likely a re…

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.