Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

The Marvelous Magic of Multimodal AI - Understanding Text, Images, Audio, and Video Generation

GOTO Conferences via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore a comprehensive conference talk that delves into the fascinating world of multimodal AI, where machines can seamlessly understand and generate text, images, audio, and video content. Learn the fundamental differences between Large Language Models (LLMs) and Large Multimodal Models (LMMs), and discover how groundbreaking technologies like Molmo by Ai2 are pushing the boundaries of AI capabilities. Understand the critical role of data representation, the challenges of natural language processing, and the importance of context in AI systems. Gain insights into the inner workings of multimodal AI, with special focus on text-to-image generation and future applications. Through practical examples and expert analysis from former INDYCAR engineer and data scientist Alex Castrounis, discover how multimodal AI is revolutionizing human-computer interaction and shaping the future of technology, including the potential to autonomously create complete explainer videos with generated scripts, visuals, music, and animations.

Syllabus

Intro
LLM vs LMM
What is multimodal AI?
Molmo by Ai2
Data ≈ Representation
Natural language is hard
What about context?
The inner workngs
Text-to-images
The AI of tomorrow
Outro

Taught by

GOTO Conferences

Reviews

Start your review of The Marvelous Magic of Multimodal AI - Understanding Text, Images, Audio, and Video Generation

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.