Databricks Lakehouse Fundamentals

Overview

Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off

One annual plan covers every course and certificate on Coursera. 40% off for a limited time.

Learn to build data pipelines on the Databricks Lakehouse Platform — from architecture concepts to hands-on Spark and Delta Lake. This beginner course starts with why the lakehouse pattern replaced separate data warehouses and data lakes, then moves directly into the Databricks workspace where you'll configure compute, write PySpark and SQL queries, and manage data with Unity Catalog's three-level namespace. Week by week, you'll progress from navigating the platform to transforming DataFrames with select, filter, groupBy, and joins, then to creating Delta Lake tables with ACID transactions, schema enforcement, and time travel. You'll perform real DML operations — INSERT, UPDATE, DELETE, and MERGE — and learn to schedule production pipelines using Databricks Jobs with DAG-based orchestration. The course runs entirely on Databricks Free Edition, so there's no cloud billing. Six hands-on labs reinforce each module: explore the workspace, write notebook-based transformations, build Delta tables, and wire up an automated workflow. By the end, you'll have built a complete data engineering pipeline from raw ingestion through Delta Lake to scheduled production jobs.

Syllabus

Lakehouse Architecture and Workspace

This module introduces the lakehouse paradigm and the Databricks platform. You'll learn about the structure of lakehouse architecture, explore the Databricks workspace and its core tools, and understand how compute and storage work together.

Apache Spark on Databricks

This module covers notebooks and hands-on data manipulation using PySpark. You'll create and organize notebooks, load data from the Catalog, and write PySpark transformations to select, filter, aggregate, and join datasets.

Delta Lake Essentials

This module introduces Delta Lake, where you'll create Delta tables, perform transactional operations like updates, deletes, and merges, use time travel to query previous versions, and see how Delta Lake connects to governance and automation features.

Capstone

Build an end-to-end lakehouse data pipeline integrating every concept from the course. Starting from raw data files, you will construct a complete medallion architecture (bronze → silver → gold) with Delta Lake, implement incremental MERGE logic, and orchestrate the pipeline as a scheduled Databricks Job. Six hands-on lab notebooks guide you through the project using the course GitHub repository.