High-Throughput Streaming in Lakehouse with Non-Blocking Concurrency Control in Apache Flink and Hudi
StreamNative via YouTube
AI Product Expert Certification - Master Generative AI Skills
35% Off Finance Skills That Get You Hired - Code CFI35
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the architecture and implementation of Apache Hudi's Non-Blocking Concurrency Control (NBCC) in this 40-minute conference talk that addresses critical challenges in real-time data processing. Learn how NBCC revolutionizes multi-stream concurrent ingestion into single tables by eliminating the bottlenecks and write failures associated with traditional Optimistic Concurrency Control approaches. Discover the innovative file layout based on commit completion time, "TrueTime" semantics for global timestamp monotonicity, and bucket index systems that enable conflict-free streaming ingestion while maintaining high throughput and data freshness. Understand how this paradigm shift integrates seamlessly with Apache Flink pipelines to support efficient event-time ordering and enable previously impossible functionalities like real-time dataset joins. Gain practical insights through a Flink SQL demonstration involving multiple concurrent writers, and explore future advancements including planned extensions to metadata tables, clustering, and additional index types that will further enhance NBCC capabilities for lakehouse architectures.
Syllabus
High-throughput streaming in Lakehouse with Non-Blocking Concurrency Control in Apache Flink & Hudi
Taught by
StreamNative