Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Coursera

Process Real-Time Data with Spark Streams

Coursera via Coursera

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Real-time data is everywhere — from fraud detection in financial transactions to personalized recommendations in e-commerce and anomaly detection in IoT devices. Traditional batch processing is too slow for these use cases, and businesses need insights the moment data is generated. This course teaches you how to design, build, and operate reliable streaming pipelines using Apache Spark Structured Streaming and Kafka. In this course, you’ll start with the fundamentals of Spark’s streaming model, learning how micro-batching, triggers, and checkpoints enable continuous processing. You’ll then connect Spark to real-world sources like Kafka, apply event-time processing with watermarks, and deliver results to Delta Lake. Finally, you’ll take pipelines to production by enriching streams with static data, monitoring query health, handling failures, and ensuring scalability. This course introduces you to real-time data processing using Apache Spark Streaming. You’ll learn how to handle continuous data flows, design fault-tolerant stream pipelines, and analyze live data efficiently. By the end, you’ll understand how Spark handles streaming workloads, integrates with various data sources, and powers decision-making in real-world applications. Learners should have a basic understanding of Python programming and Spark DataFrames, along with familiarity with JSON and SQL. By the end, you’ll have the skills to confidently implement streaming solutions that power real-time decision-making in modern data-driven organizations.

Syllabus

  • Structured Streaming Fundamentals
    • Learners are introduced to the Spark Structured Streaming model and its core concepts, including micro-batch execution, triggers, checkpoints, output modes and data transformation.
  • Sources, Sinks and Stateful Aggregations
    • This module focuses on integrating Spark with real-world streaming systems. Learners will consume data from Kafka, transform and parse messages, and write results to sinks such as Delta Lake, ensuring reliability with checkpointing and triggers
  • Building and Operating a Production-Ready Stream
    • Learners design an end-to-end streaming pipeline that combines ingestion, transformation, enrichment with static datasets, and reliable output.

Taught by

Caio Avelino and Starweaver

Reviews

Start your review of Process Real-Time Data with Spark Streams

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.