Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

PDF Document Ingestion Accelerator for GenAI Applications

Databricks via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to build an optimized structured streaming workflow for complex PDF document ingestion in GenAI applications through this 35-minute conference talk. Discover solutions to common challenges faced by financial services customers when processing unstructured PDF and image documents for downstream GenAI tasks like entity extraction and RAG-based knowledge Q&A. Explore the pain points of varying document quality from scanned physical documents, complex documents containing tables and embedded images requiring slower Tesseract OCR processing, and the need for streamlined post-processing workflows. Master key optimization techniques including Apache Spark optimization, multi-threading, PDF object extraction, skew handling, and auto retry logics to accelerate your document ingestion pipeline. Gain insights from Databricks Specialist Solution Architect Qian Yu on implementing production-ready data engineering solutions specifically designed for GenAI use cases in the financial services sector.

Syllabus

PDF Document Ingestion Accelerator for GenAI Applications

Taught by

Databricks

Reviews

Start your review of PDF Document Ingestion Accelerator for GenAI Applications

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.