Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Coursera

Ultimate AWS Data Engineering Bootcamp - 15 Real-World Labs

Packt via Coursera

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This course features Coursera Coach! A smarter way to learn with interactive, real-time conversations that help you test your knowledge, challenge assumptions, and deepen your understanding as you progress through the course. This course provides hands-on experience with essential AWS data engineering tools and techniques. You’ll work on real-world projects using services like Redshift, DynamoDB, Athena, Glue, Kinesis, and Step Functions to build data pipelines, automate workflows, and process data at scale. Through labs, you'll learn to develop batch and real-time data processing solutions, build scalable datalakes, and implement event-driven pipelines for e-commerce. Ideal for individuals with basic data engineering, cloud services, and programming knowledge, this course takes you through 15 labs to master AWS data engineering practices. Familiarity with AWS tools is beneficial but not required. By the end, you’ll have the skills to tackle complex data engineering tasks and deploy cloud-based solutions confidently.

Syllabus

  • Course Introduction
    • In this module, we will set the foundation for your journey through AWS data engineering. You'll gain clarity on the course structure, explore the tech stack—including Docker, AWS CLI, and more—and ensure your local environment is ready for executing the real-world labs. This introduction is critical to align expectations and configure the tools required for success.
  • Lab - Batch data processing of music streams using Airflow & Redshift
    • In this module, we will implement a batch data processing project for music streaming data. You'll learn to use Airflow for orchestration and Redshift Serverless for storage and querying, culminating in a full pipeline execution. The focus is on understanding the interaction between orchestration tools and AWS services.
  • Lab - Distributed music streams processing using Airflow, Spark & DynamoDB
    • In this module, we will process music stream data using a distributed system that combines PySpark and DynamoDB. You'll use Airflow to orchestrate the workflow and execute jobs using the AWS Glue Docker image locally. This project introduces scalable and parallel data processing techniques.
  • Lab - ETL for Rental apartments using Step Functions, AWS Glue, and Redshift
    • In this module, we will build a robust ETL pipeline for rental apartment data. You will set up MySQL in AWS Aurora, use Glue for data transformation, and orchestrate the workflow using Step Functions and EventBridge. This lab emphasizes automation and modular pipeline execution.
  • Lab - Build a datalake for rental vehicles store using EMR, S3 and Athena
    • In this module, we will create a datalake for a rental vehicle store using scalable services like EMR and Athena. You'll execute PySpark both locally and on the cloud, integrate metadata using Glue crawlers, and automate the pipeline using Step Functions.
  • Lab - Build Event driven pipelines for E-Commerce using ECS and Step Functions
    • In this module, we will develop an event-driven data pipeline tailored for an e-commerce application. You'll containerize Python apps, deploy them using ECS, and automate workflows using Step Functions and EventBridge. This lab blends DevOps and data pipeline principles.
  • Lab - Build a lakehouse for an E-Commerce store using Pyspark delta tables and S3
    • In this module, we will build a lakehouse architecture combining the flexibility of data lakes and the performance of data warehouses. You will use PySpark with Delta Lake, manage metadata with Glue Catalog, and query data through Athena and Redshift.
  • Lab - Event driven data processing for Taxi trips using Lambda and Kinesis
    • In this module, we will implement real-time processing of taxi trip data using a serverless approach. You'll set up Kinesis streams, deploy Lambda functions, and execute a complete pipeline. This lab reinforces serverless computing and event-driven design.
  • Lab - Process mobile network logs in real time using Pyspark & Streamlit on ECS
    • In this module, we will process mobile network logs using real-time technologies and deliver interactive insights via Streamlit. You'll build and deploy dashboards to ECS, leveraging Spark for streaming data and Glue Catalog for metadata management.
  • Lab - CI/CD for AWS Services using GITHUB ACTIONS
    • In this module, we will set up CI/CD pipelines to automate deployment of AWS Glue jobs, ECS tasks, and Lambda functions using GitHub Actions. You'll learn how to build and manage version-controlled workflows for repeatable deployments.
  • Lab - Real time data ingestion of clickstreams using Kinesis Firehose and Redshift
    • In this module, we will ingest real-time clickstream data using Kinesis Firehose and enrich it using Lambda before storing it in Redshift. You'll build a robust pipeline suitable for web analytics or behavioral tracking applications.
  • Setup MySQL Database in AWS Aurora RDS
    • In this module, we will challenge you to independently set up a MySQL database on AWS Aurora. This assignment reinforces database fundamentals and AWS RDS deployment skills.
  • Build a lakehouse on S3 for Commercial flights dataset
    • In this module, you will independently implement a lakehouse architecture for a commercial flights dataset. This assignment consolidates your understanding of data lakes, delta tables, and metadata integration with Glue.
  • Offer dynamic discounts to E-Commerce users using Real Time Events
    • In this module, you'll build a real-time system that dynamically adjusts pricing for e-commerce users based on events. This assignment emphasizes practical business applications of event-driven data processing.
  • Setup real time Pyspark streaming job for Spotify songs metrics
    • In this module, you'll build a real-time streaming job to process Spotify metrics. This assignment helps you apply PySpark and AWS Glue in real-world streaming scenarios.
  • Automate deployment of Lambda functions using Github actions
    • In this final module, you'll implement CI/CD automation for Lambda functions using GitHub Actions. This assignment solidifies your DevOps knowledge and prepares you for real-world deployment automation.

Taught by

Packt - Course Instructors

Reviews

Start your review of Ultimate AWS Data Engineering Bootcamp - 15 Real-World Labs

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.