Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Fundamentals of Apache Spark and PySpark

via Zero To Mastery Path

Go to class Write review

Overview

Get hands-on with Apache Spark and PySpark by learning how to build scalable, high-performance data pipelines using the DataFrame API, Spark jobs, joins, aggregations, and more.

Learn the skills and real-world tools used by Data Engineers and become top 10% in your field
Set up Apache Spark and configure your local or cloud environment for big data processing
Write efficient PySpark code to handle, transform, and analyze large-scale datasets
Use DataFrames to manipulate data in a distributed computing environment
Build scalable data pipelines that integrate multiple transformation and aggregation steps
Create a strong foundation for a career in Data Engineering, Data Science, and AI/ML

Syllabus

Introduction

Introduction
Exercise: Meet Your Classmates and Instructor
Course Resources

Setup and Useful Resources

[Optional] UNIX CLI Commands
[Optional] Using Windows
Installing Software for the Course
[Optional] What Is a Virtualenv?

Big Data Processing with Apache Spark

Apache Spark
How Spark Works
Spark Application
DataFrames
Installing Spark
Installing Spark on Linux
Inside Airbnb Data
Writing Your First Spark Job
Lazy Processing
[Exercise] Basic Functions
[Exercise] Basic Functions - Solution
Aggregating Data
Joining Data
Aggregations and Joins with Spark
Complex Data Types
[Exercise] Aggregate Functions
[Exercise] Aggregate Functions - Solution
User Defined Functions
Data Shuffle
Data Accumulators
Optimizing Spark Jobs
Submitting Spark Jobs
Other Spark APIs
Spark SQL
[Exercise] Advanced Spark
[Exercise] Advanced Spark - Solution
Summary

Where To Go From Here?

Let's Keep Learning Together!
Review This Byte!

Taught by

Ivan Mushketyk

Reviews

Start your review of Fundamentals of Apache Spark and PySpark