Building a Multimodal Data Lakehouse with the Daft Distributed Python Dataframe
Databricks via YouTube
AI Engineer - Learn how to integrate AI into software applications
Free courses from frontend to fullstack and AI
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore how to build a multimodal data lakehouse using Daft, a next-generation distributed query engine, in this 18-minute conference talk. Learn about processing diverse data types including numbers, strings, JSONs, images, and PDFs at scale using a familiar dataframe interface. Discover how Daft simplifies large-scale ETL processes, eliminating the need for bespoke data pipelines and custom tooling. See a demonstration of integrating Daft with existing infrastructure like S3, DeltaLake, Databricks, and Spark to create a powerful and flexible data processing solution. Gain insights from Jay Chia, Co-Founder of Eventual Computing, on leveraging Daft's Python and Rust-based architecture for efficient multimodal data handling in modern data workloads.
Syllabus
Building a Multimodal Data Lakehouse with the Daft Distributed Python Dataframe
Taught by
Databricks