Visual Question Answering: Grounded Systems and Transformer Capsules
University of Central Florida via YouTube
Lead AI Strategy with UCSB's Agentic AI Program — Microsoft Certified
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Overview
Syllabus
Intro
Grounded Visual Question Answering
Limitations of Existing VQA Systems
Grounded VQA Systems
Problem Setup
Transformers with Capsules
Approach
Capsule-based Tokens
Input to Intermediate Transformer layers
Text-based Residual Connection
Pre-training Tasks
Masked Language Modeling (MLM)
Image Text Matching
Pre-training Datasets
Fine-tuning on Downstream Task
Qualitative comparison - GQA
Evaluation Metrics
Results - GQA
Conclusion and Future Work
Taught by
UCF CRCV