Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Coursera

Multimodal RAG with GPT – Build Smarter Search & AI Systems

Packt via Coursera

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Updated in May 2025. This course now features Coursera Coach! A smarter way to learn with interactive, real-time conversations that help you test your knowledge, challenge assumptions, and deepen your understanding as you progress through the course. This course equips you with the skills to build smarter AI-driven systems using Retrieval Augmented Generation (RAG) and multimodal technology. You'll dive into the principles behind RAG and how it powers systems like advanced search engines, chatbots, and recommendation systems. The course will provide hands-on experience, enabling you to create multimodal systems that utilize images, text, and other forms of data to provide more intelligent and context-aware solutions. Starting with foundational knowledge, you will explore RAG systems, their components, and benefits. The course delves into how search capabilities can be integrated into multimodal systems and why this approach enhances both search and recommendation functionalities. You'll build multimodal search systems, creating embeddings and setting up a robust workflow to integrate different data types. You will also gain expertise in constructing a multimodal recommender system that combines RAG with GPT. As you progress, you will experiment with embedding images and using them in a vector database, setting up end-to-end systems, and refining them using hands-on lessons. Furthermore, you'll add a user interface to your multimodal recommender system, creating a polished, interactive tool that can be deployed for real-world use. By the end, you will have built a comprehensive multimodal RAG system with a recommender engine, capable of delivering highly relevant results. This course is ideal for AI enthusiasts, software developers, or data scientists looking to deepen their understanding of advanced search systems, recommendation algorithms, and the application of RAG in multimodal environments. A basic understanding of programming and machine learning concepts is recommended, and the course is suitable for intermediate learners.

Syllabus

  • Introduction & Prerequisites
    • In this module, we will introduce the course’s objectives, explain the key concepts you'll need to understand, and give you a preview of the systems you'll build. We will also discuss the course structure, helping you prepare for what's ahead.
  • Development Environment Setup - Overview
    • In this module, we will guide you through the process of setting up the development environment for the course. You’ll ensure that all the necessary tools and dependencies are ready, setting you up for success in the hands-on sections.
  • RAG (Retrieval Augmented Generation) and Multimodal Systems Deep Dive
    • In this module, we will dive into the fundamentals of RAG systems, their applications, and the benefits they bring. Additionally, we will introduce multimodal RAG systems, showcasing how they differ and how they function.
  • Search in a Multimodal RAG System
    • In this module, we will break down how search is integrated within a multimodal RAG system. We will explore its power and versatility, showcasing its transformative potential through visual explanations.
  • Hands-on: Multimodal Search RAG System
    • In this module, we will guide you through setting up a multimodal search system, from creating image embeddings to finalizing the system's functionality. You'll get hands-on experience with the full process.
  • Hands-On - Multimodal Recommender System
    • In this module, we will guide you through building a multimodal recommender system, from dataset retrieval to embedding generation. You’ll also learn how to set up the RAG flow and integrate a UI for better user interaction.
  • Next Steps
    • In this module, we will help you chart your path forward after completing the course. We’ll provide actionable next steps to continue your learning journey and explore how to apply your skills in real-world scenarios.

Taught by

Packt - Course Instructors

Reviews

Start your review of Multimodal RAG with GPT – Build Smarter Search & AI Systems

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.