Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Coursera

Building Multimodal AI Agents

via Coursera

Overview

Google, IBM & Meta Certificates – 40% Off
One plan covers every Professional Certificate on Coursera.
Unlock All Certificates
By completing this comprehensive course on building multimodal AI agents, you will master the exact orchestration techniques used by top operations architects to automate enterprise-grade digital production factories. You will learn to eliminate context fragmentation, engineer automated brand style guardians, stabilize multi-frame video consistency, and deploy persistent autonomous project workspaces. This course bridges the gap between basic prompting and scalable systems engineering, giving you the direct operational frameworks required to transform raw enterprise briefs into high-value visual assets on autopilot. What makes this course unique is its hands-on architectural approach to the leading foundational environments. Instead of treating artificial intelligence as a simple conversational chatbot, you will learn to manage ChatGPT, Claude, Gemini, and Manus AI as an elite, coordinated workforce with a shared cognitive memory layer. You will build and configure advanced Multi-Agent systems, program custom configurations via specialized dashboards, and deploy autonomous operators to execute complex web and file-compilation loops. Whether you are a software engineer optimizing token efficiency or a project manager scaling a go-to-market workflow, this course delivers a structured treasure trove of practical, non-conversational prompt frameworks that will change how you build with AI and scale your career.

Syllabus

  • Introduction to Multimodal AI Agents
    • Discover how multimodal AI agents evolve from simple prompting into autonomous systems that handle text, images, and audio seamlessly.
  • Visual and Image Generation Agents
    • Learn how to set up agents that automatically analyze visual inputs and generate tailored, high-quality images.
  • Automated Presentation and Document Agents
    • Master the use of agents that transform raw ideas and messy data into professional, visually stunning presentations and reports.
  • Video and Content Creation Agents
    • Explore how agents can take a single concept and autonomously script, storyboard, and generate video content.
  • Orchestrating Multi-Agent Content Teams
    • Connect text, image, presentation, and video agents together into a unified, collaborative AI content creation team.
  • The Future of AI and Course Wrap-Up
    • Analyze the future trends, real-world impacts, and ethical considerations of widespread autonomous multimodal AI usage.

Taught by

Anton Voroniuk

Reviews

Start your review of Building Multimodal AI Agents

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.