Exabyte-Scale Streaming Iceberg IO with Ray, Flink, and DeltaCAT

Learn to architect and implement exabyte-scale streaming data workflows using Apache Iceberg with Ray, Apache Flink, and DeltaCAT in this 34-minute conference talk from Ray Summit 2025. Discover how to integrate Ray with popular open-source streaming frameworks including Apache Flink, Apache Beam, and Apache Spark for managing massive-scale table operations. Explore practical techniques for leveraging DeltaCAT's Iceberg management jobs running on Ray alongside existing streaming pipelines to achieve reliable high-throughput data processing. Gain insights into how Pinterest unified sampling, labeling, and training processes into a single scalable pipeline, transforming dataset iteration from a bottleneck into an accelerator for rapid model improvement. Master the architectural patterns needed for building scalable Iceberg-based workflows that can handle exabyte-scale data volumes while maintaining performance and reliability across distributed streaming environments.