Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Master the design and implementation of consistent streaming data pipelines using Apache Kafka, Spark, and Flink. In this hands-on course, you'll apply systematic decision frameworks to select appropriate delivery guarantees (at-most-once, at-least-once, exactly-once) based on business requirements and failure scenario analysis. You'll implement end-to-end exactly-once processing by configuring Kafka producer transactions, Spark Structured Streaming checkpoints, and Hudi transactional tables, then validate your implementation through integration testing with failure injection. Finally, you'll evaluate watermarking strategies by analyzing event arrival patterns to optimize the latency-completeness tradeoff and meet specific SLA requirements. Through realistic scenarios—from preventing duplicate billing in order processing to optimizing IoT event pipelines for sub-10-second P95 latency—you'll develop the skills to architect production streaming systems that balance correctness, performance, and operational simplicity.
Intermediate data and platform engineers using Kafka, Spark, or Flink who want to design production streaming pipelines with correct delivery guarantees, exactly-once semantics, and low-latency processing.
Foundational knowledge of distributed systems; basic experience with Apache Kafka or similar messaging systems; familiarity with SQL; and introductory experience with stream or batch data processing concepts.
By the end of this course, you will be able to design and validate production-ready streaming pipelines with correct delivery guarantees, exactly-once semantics, and low-latency event-time processing.