Power BI Fundamentals - Create visualizations and dashboards from scratch
Learn Backend Development Part-Time, Online
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn how to build an optimized structured streaming workflow for complex PDF document ingestion in GenAI applications through this 35-minute conference talk. Discover solutions to common challenges faced by financial services customers when processing unstructured PDF and image documents for downstream GenAI tasks like entity extraction and RAG-based knowledge Q&A. Explore the pain points of varying document quality from scanned physical documents, complex documents containing tables and embedded images requiring slower Tesseract OCR processing, and the need for streamlined post-processing workflows. Master key optimization techniques including Apache Spark optimization, multi-threading, PDF object extraction, skew handling, and auto retry logics to accelerate your document ingestion pipeline. Gain insights from Databricks Specialist Solution Architect Qian Yu on implementing production-ready data engineering solutions specifically designed for GenAI use cases in the financial services sector.
Syllabus
PDF Document Ingestion Accelerator for GenAI Applications
Taught by
Databricks