Google AI Professional Certificate - Learn AI Skills That Get You Hired
Become an AI & ML Engineer with Cal Poly EPaCE — IBM-Certified Training
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to build an optimized structured streaming workflow for complex PDF document ingestion in GenAI applications through this 35-minute conference talk. Discover solutions to common challenges faced by financial services customers when processing unstructured PDF and image documents for downstream GenAI tasks like entity extraction and RAG-based knowledge Q&A. Explore the pain points of varying document quality from scanned physical documents, complex documents containing tables and embedded images requiring slower Tesseract OCR processing, and the need for streamlined post-processing workflows. Master key optimization techniques including Apache Spark optimization, multi-threading, PDF object extraction, skew handling, and auto retry logics to accelerate your document ingestion pipeline. Gain insights from Databricks Specialist Solution Architect Qian Yu on implementing production-ready data engineering solutions specifically designed for GenAI use cases in the financial services sector.
Syllabus
PDF Document Ingestion Accelerator for GenAI Applications
Taught by
Databricks