Attention to Transformers from Zero to Hero - Theory and Hands-on Projects

Learn the complete journey from neural attention mechanisms to advanced transformer architectures through this comprehensive 4-hour tutorial that builds understanding from first principles. Master the mathematical foundations and intuitive concepts behind attention mechanisms, self-attention, and how transformers revolutionized machine learning by eliminating traditional inductive biases in neural networks. Explore a decade of NLP evolution from Word2Vec and RNNs to modern GPT models, then implement generative language models step-by-step with hands-on coding exercises. Discover how transformer architectures have evolved beyond the original "Attention is All You Need" paper and gain practical experience fine-tuning large language models using Huggingface and PyTorch for custom applications. Dive into Vision Transformers to understand their effectiveness in computer vision tasks, explore Sparse Mixture of Experts architectures that power efficient LLMs like DeepSeek and Mixtral, and build speech-to-text transformers from scratch using PyTorch. The tutorial progresses from beginner-friendly explanations to advanced implementations, covering theoretical concepts, mathematical foundations, and practical applications across natural language processing, computer vision, and speech modeling domains.

Syllabus

Neural Attention - This simple example will change how you think about it
The many amazing things about Self-Attention and why they work
Here is how Transformers ended the tradition of Inductive Bias in Neural Nets
10 years of NLP history explained in 50 concepts | From Word2Vec, RNNs to GPT
From Attention to Generative Language Models - One line of code at a time!
Turns out Attention wasn't all we needed - How have modern Transformer architectures evolved?
Finetune LLMs to teach them ANYTHING with Huggingface and Pytorch | Step-by-step tutorial
Vision Transformers - The big picture of how and why it works so well.
Sparse Mixture of Experts - The transformer behind the most efficient LLMs (DeepSeek, Mixtral)
Building awesome Speech To Text Transformers from scratch - One line of Pytorch at a time!