Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Codecademy

Introduction to Big Data with PySpark

via Codecademy

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Dive into the world of big data with PySpark, a powerful tool for data processing and analysis. This course will introduce you to the fundamental concepts of big data and how it impacts fields like data science, engineering, and machine learning. You’ll learn to manage and analyze large datasets, making big data more accessible through Python and PySpark.

Syllabus

  • Introduction to Big Data: Learn about how we define big data, how big data is stored and processed, and what ethical considerations we need to keep in mind.
    • Article: What is Big Data?
    • Article: Bias in Data
    • Article: Big Data Storage and Computing
    • Quiz: Introduction to Big Data
  • Spark RDDs with PySpark: Learn one way that Spark handles big data -- through Resilient Distributed Datasets (RDDs).
    • Article: What is Spark?
    • Lesson: RDDs with PySpark
    • Quiz: Introduction to PySpark RDDs
  • Spark DataFrames with PySpark SQL: Learn about how PySpark lets you do SQL-like queries on big data datasets.
    • Lesson: PySpark SQL
    • Project: Analyzing Wikipedia Clickstreams with PySpark
    • Quiz: PySpark SQL
  • Putting it all together: Combine everything you've learned so far about PySpark to work with a big data dataset!
    • Project: Analyze Common Crawl Data with PySpark

Taught by

Andrea Hassler

Reviews

4.4 rating at Codecademy based on 308 ratings

Start your review of Introduction to Big Data with PySpark

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.