Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Coursera

Multimodal Prompting: Combining Text, Images, Audio & Video

via Coursera

Overview

AI, Data Science & Cloud Certificates from Google, IBM & Meta — 50% Off
One plan covers every Professional Certificate on Coursera. 50% off Coursera Plus Annual for 10 days only — price increases June 17.
Unlock All Certificates
Learn how to get better, more useful results from modern multimodal AI tools using text, images, and audio—without needing any coding experience. You’ll start by understanding what multimodal AI is, how it differs from text‑only chatbots, and when to use text, image, or audio inputs for everyday tasks. You’ll also set up a simple multimodal workspace using common tools so you can immediately apply what you learn. Through hands‑on, step‑by‑step activities, you’ll practice prompting with images to extract text, interpret diagrams or whiteboards, and troubleshoot common image‑related issues by adding context, constraints, and better visuals. You’ll then explore audio and voice‑to‑text prompting to quickly capture ideas, turn spoken thoughts into structured outlines, and analyze meeting recordings for transcripts, summaries, and action items. Finally, you’ll connect all three modalities—text, image, and audio—into practical workflows, such as turning a hand‑drawn sketch and spoken brief into a structured plan, or using screenshots and transcripts to summarize video content. You’ll finish the course with a simulated client scenario, a final assessment, and a clear set of next steps for continuing to build your multimodal prompting skills.

Syllabus

  • Introduction to Multimodal AI
    • In this module, you'll explore the fundamentals of multimodal AI and discover how combining text, images, and audio can enhance AI's usefulness in everyday work. You'll learn why text-only prompting is often insufficient, see practical examples where other modalities add value, and start setting up your workspace with common tools. This foundation will help you choose modalities intentionally and work confidently with multimodal systems.
  • Mastering Image Inputs (Vision)
    • This module focuses on using images as prompts to help AI extract, organize, and interpret visual information. You'll learn how AI processes photos, screenshots, whiteboards, and notes, and practice applying image prompting to real tasks like digitizing content and diagnosing visual problems. You'll also discover common limitations and how to improve results with clearer images, stronger context, and precise constraints.
  • Speaking and Listening (Audio)
    • In this module, you'll see how audio can make AI interactions faster, more natural, and more useful in real work settings. You'll explore voice-to-text prompting for brainstorming and mobile use, and learn how transcription and summarization can boost meeting productivity. Practical habits for better spoken input and reviewing transcripts will help you get the most from audio prompts.
  • Combining Modalities (Text + Image + Audio)
    • This module brings multimodal prompting together into practical workflows that reflect how AI is used in design, consulting, and knowledge work. You'll learn how one input can anchor a task while another provides context or refinement, and practice applying these patterns to sketches, video materials, and simulated client work. This will give you a realistic view of how multimodal systems support richer analysis and stronger deliverables.
  • Course Wrap-Up & Next Steps
    • In this final module, you'll consolidate your learning and prepare to continue using multimodal AI beyond the course. You'll review common mistakes, learn how to choose tools and modalities effectively, and identify next steps for ongoing practice. The module concludes with a final assessment to confirm your understanding and help you develop a practical strategy for future multimodal work.

Taught by

Anton Voroniuk

Reviews

Start your review of Multimodal Prompting: Combining Text, Images, Audio & Video

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.