Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

DataCamp

Big Data with PySpark

via DataCamp

Overview

DataCamp Flash Sale:
50% Off - Build Data and AI Skills!
Grab it
Advance your data skills by mastering Apache Spark. Using the Spark Python API, PySpark, you will leverage parallel computation with large datasets, and get ready for high-performance machine learning. From cleaning data to creating features and implementing machine learning models, you'll execute end-to-end workflows with Spark. The track ends with building a recommendation engine using the popular MovieLens dataset and the Million Songs dataset.

Syllabus

  • Introduction to PySpark
    • Master PySpark to handle big data with ease—learn to process, query, and optimize massive datasets for powerful analytics!
  • Big Data Fundamentals with PySpark
    • Learn the fundamentals of working with big data with PySpark.
  • Cleaning Data with PySpark
    • Learn how to clean data with Apache Spark in Python.
  • Feature Engineering with PySpark
    • Learn the gritty details that data scientists are spending 70-80% of their time on; data wrangling and feature engineering.
  • Machine Learning with PySpark
    • Learn how to make predictions from data with Apache Spark, using decision trees, logistic regression, linear regression, ensembles, and pipelines.
  • Building Recommendation Engines with PySpark
    • Learn tools and techniques to leverage your own big data to facilitate positive experiences for your users.
  • Building a Demand Forecasting Model

Taught by

Nick Solomon, Lore Dirick, John Hogue, Shantanu Trivedi, Upendra Kumar Devisetty, Andrew Collier, and Mike Metzger

Reviews

Start your review of Big Data with PySpark

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.