Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Building Multimodal AI Agents From Scratch

AI Engineer via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to build a multimodal AI agent from scratch in this hands-on workshop that processes mixed-media content including charts, diagrams, and documents with embedded visuals. Implement core components directly using Python while leveraging MongoDB as a vector database and memory store, combined with Google's Gemini for multimodal reasoning capabilities. Gain practical experience with multimodal data processing pipelines and agent orchestration patterns through direct implementation rather than relying on pre-built frameworks. Work through extracting insights from various visual content types and develop skills in handling complex multimodal data workflows. Access provided GitHub repository materials and resources to support the hands-on learning experience throughout the workshop.

Syllabus

Building Multimodal AI Agents From Scratch — Apoorva Joshi, MongoDB

Taught by

AI Engineer

Reviews

Start your review of Building Multimodal AI Agents From Scratch

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.