Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Udemy

Ultimate AWS Data Engineering Bootcamp: 15 Real-World Labs

via Udemy

Overview

Master AWS Data Engineering with hands-on Labs: Batch, Event driven and Real time data processing on AWS.

What you'll learn:
  • Master AWS data engineering tools and services with hands-on real-world projects.
  • Design, build, and optimize scalable data pipelines on AWS from scratch.
  • Implement advanced data engineering techniques, including batch, real time streaming and event driven processing.
  • Gain practical experience with all the major AWS services like Spark,Glue,Kinesis, ECS, EMR and a lot more

Welcome to the most definitive course for mastering data engineering on AWS. This comprehensive bootcamp is designed to take you from a beginner to an expert, equipping you with the skills to tackle real-world data challenges using the most powerful AWS services and tools.

What You’ll Learn:

In this course, you’ll dive deep into the core aspects of data engineering, focusing on both batch and real-time data processing. You’ll gain hands-on experience with:

  • Batch ETL and Processing with PySpark on AWS Glue and EMR: Learn to design, implement, and optimize scalable ETL pipelines, transforming raw data into actionable insights.

  • Real-Time Streaming with PySpark Streaming : Master real-time data processing and analytics to handle streaming data with precision and efficiency.

  • Containerized Python Workloads with ECS: Discover how to manage and deploy containerized Python applications on AWS, leveraging ECS for scalability and reliability.

  • Data Orchestration with Airflow and Step Functions: Orchestrate complex workflows and automate data pipelines using the best-in-class tools for data orchestration.

  • Event-Driven and Real-Time Processing with AWS Kinesis: Build robust, event-driven architectures and process streaming data in real-time, ensuring that your data pipelines are always up to date.

  • Data Warehousing with Amazon Redshift: Explore the intricacies of Redshift, AWS’s powerful data warehouse, to store and analyze massive datasets efficiently.

  • Database Management with MySQL Aurora and DynamoDB: Get hands-on with relational and NoSQL databases, optimizing data storage and retrieval for different use cases.

  • Serverless Data Processing with Lambda Functions: Harness the power of AWS Lambda to process data in real-time, triggering workflows based on events.

  • Glue Python Shell Jobs for Python Workloads: Utilize Glue’s Python shell jobs to run Python scripts in a managed environment, perfect for custom data processing tasks.

  • Delta Lake on Spark: Understand the concepts behind Delta Lake and a lakehouse architecture, and how it enhances Spark for building reliable, scalable data lakes.

  • CI/CD with GitHub Actions: Implement continuous integration and continuous delivery pipelines, automating your data engineering workflows with GitHub Actions.

Why This Course?

This bootcamp is not just another theoretical course – it’s packed with real-world labs that simulate the challenges data engineers face daily. You’ll get to build, deploy, and manage data pipelines and architectures that you can directly apply in your work or projects. Whether you’re just starting out or looking to level up your skills, this course provides everything you need to become an AWS data engineering expert.

Syllabus

  • Course Introduction
  • Lab-Batch data processing of music streams using Airflow & Redshift
  • Lab-Distributed music streams processing using Airflow, Spark & Dynamodb
  • Lab-ETL for Rental apartments using Step functions,AWS Glue and Redshift
  • Lab-Build datalake for rental vehicles store using EMR,S3 and Athena
  • Lab-Build Event driven pipelines for E-Commerce using ECS and Step functions
  • Lab-Build a lakehouse for an E-Commerce store using Pyspark delta tables and S3
  • Lab-Event driven data processing for Taxi trips using Lambda and Kinesis
  • Lab-Process mobile network logs in real time using Pyspark & streamlit on ECS
  • Lab-CI/CD for AWS Services using GITHUB ACTIONS
  • Lab-Real time data ingestion of clickstreams using Kinesis Firehose and Redshift
  • Assignment 1 - Setup Mysql Database in AWS Aurora RDS
  • Assignment 2-Build a lakehouse on S3 for Commercial flights dataset
  • Assignment 3 - Offer dynamic discounts to ECommerce users using Real Time Events
  • Assignment 4 - Setup real time Pyspark streaming job for Spotify songs metrics
  • Assignment 5 - Automate deployment of Lambda functions using Github actions

Taught by

Sid Raghunath

Reviews

4.5 rating at Udemy based on 176 ratings

Start your review of Ultimate AWS Data Engineering Bootcamp: 15 Real-World Labs

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.