Real-time analytics with Spark: User Activity Monitoring

Overview

Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off

One annual plan covers every course and certificate on Coursera. 40% off for a limited time.

In this hands-on, 1-hour project-based course, you will master real-time data processing using Apache Spark Structured Streaming. This course is designed for data engineers and developers who want to gain practical experience in building streaming data pipelines. You will begin by setting up the Spark environment and learn how to configure micro-batches and fault tolerance mechanisms through checkpointing. Next, you’ll dive into transforming streaming data by applying filters, maps, and aggregations to extract meaningful insights. You'll also handle out-of-order data with watermarks, ensuring the accuracy of your real-time analytics. The course will introduce you to querying streaming data using SQL, allowing you to perform transformations and aggregations on live data. Finally, you will learn to deploy your streaming pipeline to production by writing results to an external sink like Parquet files. This is an intermediate level project and in order to succeed in this course it is recommended to have basic understanding of Apache Spark and API PySpark, proficiency in programming and big data as well and some basic knowledge on writing SQL queries. This is the perfect opportunity for anyone looking to dive into real-time data processing and Spark Structured Streaming!

Syllabus

Project Overview

In this hands-on, 1-hour project-based course, you will master real-time data processing using Apache Spark Structured Streaming. This course is designed for data engineers and developers who want to gain practical experience in building streaming data pipelines. You will begin by setting up the Spark environment and learn how to configure micro-batches and fault tolerance mechanisms through checkpointing. Next, you’ll dive into transforming streaming data by applying filters, maps, and aggregations to extract meaningful insights. You'll also handle out-of-order data with watermarks, ensuring the accuracy of your real-time analytics. The course will introduce you to querying streaming data using SQL, allowing you to perform transformations and aggregations on live data. Finally, you will learn to deploy your streaming pipeline to production by writing results to an external sink like Parquet files. This course has no strict prerequisites, though familiarity with pyspark will help you get the most out of it. Perfect for anyone looking to dive into real-time data processing and Spark Structured Streaming!