Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Zero To Mastery

The Data Engineering Bootcamp: Zero to Mastery

via Zero To Mastery

Overview

Learn Data Engineering end-to-end. Build real-time pipelines with Apache Kafka & Flink, data lakes on AWS, machine learning workflows with Spark, and integrate LLMs into production-ready systems. Designed to launch your career as a future-ready Data Engineer.
  • Learn the skills and real-world tools used by Data Engineers and become top 10% in your field
  • Build stream-processing pipelines with Apache Kafka and Apache Flink
  • Create scalable, cloud-based data lakes on AWS using S3, EMR, and Athena
  • Develop distributed processing jobs with Apache Spark and orchestrate workflows with Apache Airflow
  • Future-proof your skills by learning to integrate AI & machine learning including using Spark ML and LLMs
  • Build real-world, production-ready projects and pipelines using popular open source software

Syllabus

  •   Introduction
    • The Data Engineering Bootcamp: Zero to Mastery
    • Exercise: Meet Your Classmates and Instructor
    • Course Resources
    • Understanding Your Video Player
    • Set Your Learning Streak Goal
  •   Section 00 - Introduction to Data Engineering
    • Introduction
    • Storing Data
    • Processing Data
    • Data Sources
    • Orchestration
    • Stream Processing
    • AI and ML with Data Engineering
    • Serving Data
    • Cloud and Data Engineering
    • Source Code for This Bootcamp
    • Prerequisites
    • What’s Next?
    • Installing Software for the Course
    • [Optional] Using Windows
  •   Section 01: Data Engineering Fundamentals: Python, SQL + more
    • Introduction
    • Quick Note On This Section
    • Jupyter Notebooks
    • Python - Lists
    • Python - Tuples
    • Python - Dictionaries
    • Python - Sets
    • Python - Range
    • Python - Comprehensions
    • Python - Strings Formatting
    • Python - Functions
    • Python - Decorators
    • Python - Exceptions
    • Python - Classes - Part 1
    • Python - Classes - Part 2
    • Python - Iterators
    • CLI - Basic Commands
    • CLI - Combining Commands
    • CLI - Environment Variables
    • Virtual Environments - What Is a Virtualenv?
    • SQL - Introduction
    • SQL - Environment Set Up
    • SQL - Fetching Data
    • SQL - Grouping Rows
    • SQL - Joining Data
    • SQL - Creating Data
  •   Section 02 - Big Data Processing with Apache Spark: Process & Analyze Real-World Airbnb Data
    • Introduction
    • Apache Spark
    • How Spark Works
    • Spark Application
    • DataFrames
    • Installing Spark
    • Installing Spark on Linux
    • Inside Airbnb Data
    • Writing Your First Spark Job
    • Lazy Processing
    • [Note] Minor correction
    • [Exercise] Basic Functions
    • [Exercise] Basic Functions - Solution
    • Aggregating Data
    • Joining Data
    • Aggregations and Joins with Spark
    • Complex Data Types
    • [Exercise] Aggregate Functions
    • [Exercise] Aggregate Functions - Solution
    • User Defined Functions
    • Data Shuffle
    • Data Accumulators
    • Optimizing Spark Jobs
    • Submitting Spark Jobs
    • Other Spark APIs
    • Spark SQL
    • [Exercise] Advanced Spark
    • [Exercise] Advanced Spark - Solution
    • Summary
    • Let's Have Some Fun (+ More Resources)
  •   Section 03 - Creating a Data Lake with AWS
    • Introduction
    • What Is a Data Lake?
    • Amazon Web Services (AWS)
    • Simple Storage Service (S3)
    • Setting Up an AWS Account
    • Data Partitioning
    • Using S3
    • EMR Serverless
    • IAM Roles
    • Running a Spark Job
    • Parquet Data Format
    • Implementing a Data Catalog
    • Data Catalog Demo
    • Querying a Data Lake
    • Summary
    • Course Check-In
  •   Section 04 - Implementing Data Pipelines with Apache Airflow
    • Introduction
    • What Is Apache Airflow?
    • Airflow’s Architecture
    • Installing Airflow
    • Defining an Airflow DAG
    • Errors Handling
    • Idempotent Tasks
    • Creating a DAG - Part 1
    • Creating a DAG - Part 2
    • Handling Failed Tasks
    • [Exercise] Data Validation
    • [Exercise] Data Validation - Solution
    • Spark with Airflow
    • Using Spark with Airflow - Part 1
    • Using Spark with Airflow - Part 2
    • Sensors In Airflow
    • Using File Sensors
    • Data Ingestion
    • Reading Data From Postgres - Part 1
    • Reading Data from Postgres - Part 2
    • [Exercise] Average Customer Review
    • [Exercise] Average Customer Review - Solution
    • Advanced DAGs
    • Summary
    • Unlimited Updates
  •   Section 05 - Machine Learning with Spark ML: Create a Data Pipeline, Train a Model + more
    • Introduction
    • What Is Machine Learning
    • Regression Algorithms
    • Building a Regression Model
    • Training a Model
    • Model Evaluation
    • Testing a Regression Model
    • Model Lifecycle
    • Feature Engineering
    • Improving a Regression Model
    • Machine Learning Pipelines
    • Creating a Pipeline
    • [Exercise] House Price Estimation
    • [Exercise] House Price Estimation - Solution
    • [Exercise] Imposter Syndrome
    • Classification
    • Classifiers Evaluation
    • Training a Classifier
    • Hyperparameters
    • Optimizing a Model
    • [Exercise] Loan Approval
    • [Exercise] Load Approval - Solution
    • Deep Learning
    • Summary
    • Implement a New Life System
  •   Section 06 - Using AI with Data Engineering: LLMs, HuggingFace + more
    • Introduction
    • Natural Language Processing (NLP) before LLMs
    • Transformers
    • Types of LLMs
    • Hugging Face
    • Databricks Set Up
    • Using an LLM
    • Structured Output
    • Producing JSON Output
    • LLMs With Apache Spark
    • Summary
  •   Section 07 - Real-Time Data Processing ("Stream Processing") with Apache Kafka
    • Introduction
    • What Is Apache Kafka?
    • Partitioning Data
    • Kafka API
    • Kafka Architecture
    • Set Up Kafka
    • Writing to Kafka
    • Reading from Kafka
    • Data Durability
    • Kafka vs Queues
    • [Exercise] Processing Records
    • [Exercise] Processing Records - Solution
    • Delivery Semantics
    • Kafka Transactions
    • Log Compaction
    • Kafka Connect
    • Using Kafka Connect
    • Outbox Pattern
    • Schema Registry
    • Using Schema Registry
    • Tiered Storage
    • [Exercise] Track Order Status Changes
    • [Exercise] Track Order Status Changes - Solution
    • Summary
  •   Section 08 - Stream Processing with Apache Flink
    • Introduction
    • What Is Apache Flink?
    • Flink Applications
    • Multiple Streams
    • Installing Apache Flink
    • Processing Individual Records
    • [Exercise] Stream Processing
    • [Exercise] Stream Processing - Solution
    • Time Windows
    • Keyed Windows
    • Using Time Windows
    • Watermarks
    • Advanced Window Operations
    • Stateful Stream Processing
    • Using Local State
    • [Exercise] Anomalies Detection
    • [Exercise] Anomalies Detection - Solution
    • Joining Streams
    • Summary
  •   Where To Go From Here?
    • Thank You!
    • Review This Course!
    • Become An Alumni
    • Learning Guideline
    • ZTM Events Every Month
    • LinkedIn Endorsements

Taught by

Ivan Mushketyk

Reviews

Start your review of The Data Engineering Bootcamp: Zero to Mastery

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.