Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

IBM

Developing Multimodal Generative AI Applications

IBM via edX

Overview

MIT Sloan: Drive Business Value with AI
6-week cohort with live MIT Faculty sessions. Learn to scale AI beyond the pilot stage.
Build Your AI Strategy

Unlock the power of multimodal AI and learn how modern systems combine text, images, speech, and video to create intelligent applications. This course teaches the foundational concepts behind multimodal GenAI applications, the challenges of integrating diverse data types, and the techniques used to build advanced, interactive systems. You’ll develop core skills in transcription, text-to-speech, image generation, video synthesis, and multimodal reasoning.

Through hands-on labs, you’ll work with Generative AI models like IBM Granite, OpenAI Whisper, DALL·E, Sora, Meta’s Llama, Mixtral, and vision-language architectures to apply multimodal AI in practical scenarios. You’ll build tools such as captioning systems, video-from-text generators, and AI-powered assistants that can process and respond across multiple data streams.

The course includes full-stack projects using Python, Flask, and Gradio, where you’ll design and deploy complete multimodal AI applications. By the end, you’ll have the technical skills needed to create next-generation AI systems used in search engines, chatbots, creative tools, and enterprise applications.

Syllabus

  • Build the job-ready skills you need to build multimodal generative AI applications in just a few hours

  • Understand the fundamental concepts and challenges in multimodal AI, including the integration of text, speech, images, and video

  • Build multimodal AI applications using state-of-the-art models and frameworks such as IBM Granite, Meta’s Llama, OpenAI Whisper, DALL·E, and Sora

  • Develop multimodal AI solutions, including chatbots and image/video generation models, using IBM watsonx.ai, Hugging Face, Flask, and Gradio

  • Apply multimodal search, retrieval, and question answering techniques to solve practical problems

  • Design and deploy full-stack multimodal systems that combine audio, vision, and language models

Reviews

Start your review of Developing Multimodal Generative AI Applications

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.