Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Masked Self-Attention from Scratch in Python

Yacine Mahdid via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This tutorial walks through implementing masked self-attention from scratch using Python and NumPy. Learn the theoretical foundations of self-attention mechanisms before diving into a step-by-step coding implementation. The 14-minute guide covers the complete algorithm recipe used in large language model pretraining, breaking down the process into five key coding steps: computing query-key-value matrices, calculating attention scores, applying masks, implementing softmax calculations, and generating the final output. Follow along with the provided deep-ml problem set to gain hands-on experience with this fundamental component of modern language models. Understanding the nuances of masked self-attention will enhance your comprehension of how large language models are pretrained.

Syllabus

- Introduction: 0:00
- Self-Attention Theory: 0:32
- Algorithm Recipe: 2:18
- Code Step 1 - Compute QKV: 7:26
- Code Step 2 Compute attention scores: 9:18
- Code Step 3 Applying Mask: 10:10
- Code Step 4 Softmax Calculation: 10:37
- Code Step 5 Computing the Output: 12:50
- Conclusion: 13:27

Taught by

Yacine Mahdid

Reviews

Start your review of Masked Self-Attention from Scratch in Python

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.