Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Linux Foundation

Unlocking Insights From Multimodal PDFs Using OpenSearch and Vision-Language Models

Linux Foundation via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore advanced techniques for extracting insights from complex PDF documents containing text, tables, and images in this 33-minute conference talk from the Linux Foundation's Open Source Summit. Learn two powerful approaches to handle multimodal PDF content: building specialized pipelines that integrate OCR and machine learning models for processing diverse data types, and utilizing cutting-edge Vision-Language Models like ColPali to represent multimodal information in a unified format. Discover how to implement these methods using OpenSearch's robust search and ingest pipelines to create intelligent conversational search applications with open-source technology. Watch a live demonstration showcasing practical implementations that will help you determine which approach best suits your specific requirements for processing unstructured PDF documents and unlocking their hidden insights.

Syllabus

Unlocking Insights From Multimodal PDFs Using OpenSearch and V... Mingshi Liu & Praveen Mohan Prasad

Taught by

Linux Foundation

Reviews

Start your review of Unlocking Insights From Multimodal PDFs Using OpenSearch and Vision-Language Models

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.