AI, Data Science & Cloud Certificates from Google, IBM & Meta
Learn the Skills Netflix, Meta, and Capital One Actually Hire For
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore a comprehensive video explanation of the BigBird paper, which introduces a novel sparse attention mechanism for transformers to handle longer sequences. Learn about the challenges of quadratic memory requirements in full attention models and how BigBird addresses this issue through a combination of random, window, and global attention. Discover the theoretical foundations, including universal approximation and Turing completeness, as well as the practical implications for NLP tasks such as question answering and summarization. Gain insights into the experimental parameters, structured block computations, and results that demonstrate BigBird's improved performance on various NLP tasks and its potential applications in genomics.
Syllabus
- Intro & Overview
- Quadratic Memory in Full Attention
- Architecture Overview
- Random Attention
- Window Attention
- Global Attention
- Architecture Summary
- Theoretical Result
- Experimental Parameters
- Structured Block Computations
- Recap
- Experimental Results
- Conclusion
Taught by
Yannic Kilcher