Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Build an AI Document Processing Pipeline for RAG - OCR, Image to Text, VLM, Chunking

Venelin Valkov via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This 26-minute tutorial walks you through building a complete AI document processing pipeline locally for RAG applications. Learn to convert PDFs, perform OCR, extract text from images using Vision Language Models, implement semantic chunking with LLMs, and add contextual enrichment. The step-by-step guide demonstrates using tools like Docling for document processing and Ollama with Gemma 3 for intelligent chunking. Follow along as the instructor processes a sample NVIDIA financial report, visually inspects the results, handles image annotations, and tests the pipeline with a simple RAG implementation. Access the full tutorial with source code through MLExpert Pro, and explore additional resources including the GitHub repository and evaluation methods for chunking strategies.

Syllabus

00:00 - Welcome
01:01 - Document processing pipeline
02:07 - Full-text tutorial and source code on MLExpert.io
02:41 - Docling
03:53 - PDF document sample
04:38 - Notebook setup
05:45 - PDF to Markdown OCR, layout analysis, image to text
08:45 - Visual inspection
11:02 - Image annotations
14:37 - Chunking with Ollama and Gemma 3
19:58 - Contextual enrichment retrieval
21:50 - Test the pipeline with simple RAG
24:42 - Conclusion

Taught by

Venelin Valkov

Reviews

Start your review of Build an AI Document Processing Pipeline for RAG - OCR, Image to Text, VLM, Chunking

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.