Introduction to Multimodal Large Language Models I - Day 10 Morning
Center for Language & Speech Processing(CLSP), JHU via YouTube
Learn Excel & Financial Modeling the Way Finance Teams Actually Use Them
Live Online Classes in Design, Coding & AI — Small Classes, Free Retakes
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore the fundamentals of multimodal large language models through comprehensive tutorial slides presented by experts from University of Maryland, Brno University of Technology, and Universidad Autónoma de Madrid. Learn core concepts, architectures, and applications of models that can process and understand multiple types of data including text, images, and audio simultaneously. Discover how these advanced AI systems integrate different modalities to perform complex reasoning tasks, understand cross-modal relationships, and generate coherent responses across various input types. Examine the technical foundations underlying multimodal LLMs, including attention mechanisms, fusion strategies, and training methodologies that enable these models to bridge the gap between different forms of human communication and expression.
Syllabus
[slides] Day 10 morning - JSALT 2025 - Introduction to Multimodal Large Language Models I.
Taught by
Center for Language & Speech Processing(CLSP), JHU