Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

DataCamp

Data Transformation with Spark SQL in Databricks

via DataCamp

Overview

Build end-to-end data pipelines - from cleaning and aggregation to streaming and orchestration.


Ready to handle real-world data at scale? This course teaches you to transform large datasets using Spark SQL and PySpark in Databricks. Learn to shape and clean data, run aggregations with optimized joins, and apply window functions for advanced analytics. You'll also set up file-based streaming with fault-tolerant checkpoints and persist results as Delta tables. By the end, you'll be orchestrating multi-step production pipelines with Databricks Workflows and Lakeflow Declarative Pipelines.

Syllabus

  • Loading and Shaping Data
    • In this chapter, you'll learn how to work with Databricks notebooks, load CSV data into Spark DataFrames, and shape data using PySpark and SQL.
  • Data Cleaning and Optimization
    • Learn how to define explicit schemas, build a data cleaning pipeline, and optimize query performance with broadcast joins.
  • Analytics and Production Pipelines
    • Learn how to calculate running totals and rankings with window functions, build streaming pipelines, and deploy production workflows.

Taught by

Disha Mukherjee

Reviews

Start your review of Data Transformation with Spark SQL in Databricks

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.