Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Transformer Self-Attention - Calculating Attention Scores - LLM Series Lecture 7

Code With Aarohi via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the fundamental mechanics of Scaled Dot-Product Attention in this 17-minute lecture that breaks down one of the most crucial concepts in Transformer models from the groundbreaking "Attention Is All You Need" paper. Learn how attention scores are calculated step-by-step within the Transformer encoder, building upon previous concepts of Query, Key, and Value vectors. Discover the complete process from input preparation through tokenization, embeddings, and positional encoding to the final computation of context-aware representations. Master the step-by-step calculation of dot-product attention, understand why attention scores are scaled using √dₖ, and see how softmax transforms scores into attention weights that are then multiplied with Value vectors. Gain clear insight into matrix dimensions and shapes throughout the process, including Q, K, Kᵀ, QKᵀ, and output dimensions, while developing intuition for how self-attention creates context-aware representations. By the end, fully comprehend the mathematical formula Attention(Q, K, V) = softmax(QKᵀ / √dₖ) × V and its practical implementation in Transformer architectures, making this ideal for beginners learning Transformers, students studying Deep Learning and NLP, and anyone preparing for AI interviews or research.

Syllabus

L-7 Transformer Self-Attention | Calculating Attention Scores | LLM Series

Taught by

Code With Aarohi

Reviews

Start your review of Transformer Self-Attention - Calculating Attention Scores - LLM Series Lecture 7

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.