Save 43% on 1 Year of Coursera Plus
PowerBI Data Analyst - Create visualizations and dashboards from scratch
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore Vision Transformers (ViT) in this comprehensive 22-minute educational video that demystifies this groundbreaking computer vision architecture. Learn what Vision Transformers are and understand the fundamental reasons behind their development as an alternative to traditional convolutional neural networks for image processing tasks. Discover how ViTs adapt the transformer architecture, originally designed for natural language processing, to handle visual data by treating image patches as sequences. Dive deep into the pretraining process, understanding how these models learn robust visual representations from large datasets, and master the fine-tuning techniques used to adapt pretrained ViT models for specific downstream tasks. Test your knowledge with an interactive quiz section and consolidate your learning through a comprehensive summary that reinforces key concepts. The tutorial includes access to detailed slides, references to the original Vision Transformer research paper, and connections to foundational transformer concepts, making it suitable for machine learning practitioners, computer vision enthusiasts, and researchers looking to understand this influential architecture that has revolutionized how we approach image classification and visual understanding tasks.
Syllabus
What is ViT?
Why do we have ViTs?
Pretraining
Fine tuning
Quiz Time
Summary
Taught by
CodeEmporium