One Pipeline to Rule Them All - Unifying Multimodal and AI Data Processing with Daft

Learn how to unify multimodal and AI data processing using Daft, a Python-native data engine that eliminates the need for juggling multiple tools like ffmpeg, custom scripts, and Spark in this 26-minute conference talk from the Linux Foundation's Open Source Summit. Discover how Daft handles everything from structured tables to images and embeddings within a single framework, featuring native integrations with data catalogs like Iceberg and Delta Lake, plus VectorDBs including Turbopuffer and Lance. Watch live demonstrations of large-scale document processing, batch inference, and multimodal ETL operations all executed within one unified data pipeline. Explore how purpose-built infrastructure can transform the chaos of processing millions of images, documents, and structured data into streamlined competitive advantages, whether you're working with terabyte-scale datasets for foundation model training or building real-time inference systems. Replace fragile, multi-tool data pipelines that break under production load with a robust solution that accelerates iteration speed and handles multimodal data processing at scale.