Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Big Data with PySpark

Go to class Write review

Overview

Advance your data skills by mastering Apache Spark. Using the Spark Python API, PySpark, you will leverage parallel computation with large datasets, and get ready for high-performance machine learning. From cleaning data to creating features and implementing machine learning models, you'll execute end-to-end workflows with Spark. The track ends with building a recommendation engine using the popular MovieLens dataset and the Million Songs dataset.

Syllabus

Introduction to PySpark

Master PySpark to handle big data with ease—learn to process, query, and optimize massive datasets for powerful analytics!

Big Data Fundamentals with PySpark

Learn the fundamentals of working with big data with PySpark.

Cleaning Data with PySpark

Learn how to clean data with Apache Spark in Python.

Feature Engineering with PySpark

Learn the gritty details that data scientists are spending 70-80% of their time on; data wrangling and feature engineering.

Machine Learning with PySpark

Learn how to make predictions from data with Apache Spark, using decision trees, logistic regression, linear regression, ensembles, and pipelines.

Building Recommendation Engines with PySpark

Learn tools and techniques to leverage your own big data to facilitate positive experiences for your users.

Building a Demand Forecasting Model

Taught by

Nick Solomon, Lore Dirick, John Hogue, Shantanu Trivedi, Upendra Kumar Devisetty, Andrew Collier, and Mike Metzger

Reviews

Start your review of Big Data with PySpark