Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Vision-Language Models - A New Architecture for Embedding Models

Qdrant - Vector Database & Search Engine via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the cutting-edge architecture of Vision-Language Models (VLMs) and their application as embedding models in this 19-minute conference talk from Qdrant's Vector Space Day 2025. Discover how transformer architectures enable VLMs to learn from mixed text-image inputs and serve as powerful backbones for embedding models like jina-embeddings-v4. Learn about training insights for VLM-based embedding models that support both dense single-vector and late-interaction multi-vector retrieval across multiple domains, tasks, and languages. Examine the particular strengths of VLMs when processing images containing text and diagrams, UI screenshots, and illustrations. Understand critical factors affecting performance including image resolution, retrieval objectives, and the impact of the modality gap on retrieval effectiveness. Gain insights into model evaluation methodologies and operational efficiency considerations through comparisons of post-training quantization versus quantization-aware training, including trade-offs between model footprint, throughput, and accuracy.

Syllabus

Vision-Language Models: A New Architecture for Embedding Models | Jina AI | Michael Günther

Taught by

Qdrant - Vector Database & Search Engine

Reviews

Start your review of Vision-Language Models - A New Architecture for Embedding Models

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.