Building a Modern Streaming Data Pipeline with Apache Flink, Iceberg and Paimon

Learn to architect a robust, end-to-end streaming data pipeline using Apache Flink, Iceberg, and Paimon in this 34-minute conference talk. Discover how to move beyond traditional batch ETL pipelines to build real-time, scalable, and cost-efficient data processing systems that seamlessly integrate event streams from Kafka with transactional data from MySQL. Explore best practices for consuming high-throughput streaming data from Kafka, using Flink SQL to enrich streaming events with transactional data, and implementing efficient stateful processing techniques for large-scale real-time analytics. Compare the key features and trade-offs between Apache Iceberg and Apache Paimon to determine when to use each for scalable, queryable streaming data lakes. Examine real-world use cases showing how companies adopt streaming data lake architectures to improve reporting, machine learning, and real-time operational analytics. Understand the advantages of streaming-first approaches over traditional batch ETL and data warehouse architectures, including improved latency, cost efficiency, and data freshness. Gain insights into Paimon's native support for streaming workloads compared to Iceberg's batch-oriented design, and learn about Fluz's additional optimization capabilities including streaming compaction, auto-tuning, and deduplication features.