Evolution of Transformer Architectures - From Attention to Modern Variants
Neural Breakdown with AVB via YouTube
AI, Data Science & Cloud Certificates from Google, IBM & Meta
Learn EDR Internals: Research & Development From The Masters
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore the evolution of neural attention mechanisms in this 25-minute technical video, starting from Bahnadau Attention and progressing through Self-Attention and Causal Masked Attention as introduced in the "Attention is all you need" paper. Dive deep into advanced implementations of Multi-Headed Attention, including Multi Query Attention and Grouped Query Attention, while learning about crucial innovations in Transformer and Large Language Model architectures such as KV Caching. Through detailed visualizations and graphics, gain a comprehensive understanding of language modeling, next word prediction, and various attention mechanisms that have shaped modern AI architectures. Master key concepts through a structured progression covering Self Attention, Causal Masked Attention, Multi-Headed Attention, KV Cache, Multi Query Attention, and Grouped Query Attention, with special emphasis on performance implications and architectural trade-offs.
Syllabus
Correction in the slide at - MHA has high latency runs slow MQA has low latency runs faster
- Intro
- Language Modeling and Next Word Prediction
- Self Attention
- Causal Masked Attention
- Multi Headed Attention
- KV Cache
- Multi Query Attention
- Grouped Query Attention
Taught by
Neural Breakdown with AVB