Introduction to Multimodal Large Language Models II - Day 10 Afternoon
Center for Language & Speech Processing(CLSP), JHU via YouTube
Build with Azure OpenAI, Copilot Studio & Agentic Frameworks — Microsoft Certified
Stuck in Tutorial Hell? Learn Backend Dev the Right Way
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore advanced concepts in multimodal large language models through this laboratory-focused tutorial session. Engage with hands-on Python notebook exercises covering simple examples, practical implementation of AF2 and AF3 models, experimentation with MMAU-Pro, data preparation for AQA (Audio Question Answering), and basic training procedures. Learn from experts Alicia Lozano-Diez from Universidad Autónoma de Madrid and Ramani Duraiswami from University of Maryland as they guide you through practical applications in the field of large audio language models. Access comprehensive resources including GitHub repositories and Colab notebooks to reinforce your understanding of expert-level reasoning and understanding in multimodal AI systems. Build upon foundational knowledge from the prerequisite Introduction to Multimodal Large Language Models I session while developing practical skills in implementing and training these advanced models.
Syllabus
[camera] Day 10 afternoon - JSALT 2025 - Introduction to Multimodal Large Language Models II
Taught by
Center for Language & Speech Processing(CLSP), JHU