Multimodal AI: From First Principles to Neural Networks That See, Hear and Write
Neural Breakdown with AVB via YouTube
Our career paths help you become job ready faster
Google AI Professional Certificate - Learn AI Skills That Get You Hired
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a 20-minute technical video that delves into the fundamental principles of multimodal artificial intelligence, focusing on neural networks capable of processing visual, auditory, and textual information simultaneously. Learn about cutting-edge models released in 2023 including GPT-4, PaLM 2, and ImageBind, while understanding how multimodal modeling combines various input types to perform sophisticated tasks like text-image retrieval, multimodal vector arithmetic, and visual question answering. Progress through key concepts starting with basics, advancing to contrastive learning, masked visual language models, unified models, and culminating with generative large language models. Gain insights from numerous research papers and real-world applications, supported by detailed explanations of essential published techniques in multimodal modeling. Access additional resources including scripts, slides, animations, and illustrations through channel membership, while staying connected through social media for continued learning.
Syllabus
- Intro
- Basics
- Contrastive Learning
- Masked Visual Language Models
- Unified Models
- Generative LLMs
Taught by
Neural Breakdown with AVB