Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Building Distributed Modern Data Lakehouse From Scratch with Apache Iceberg - An End to End Project

CodeWithYu via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to build a comprehensive distributed data lakehouse from the ground up using Apache Iceberg, Trino, Airflow, DBT, MinIO, and Project Nessie in this hands-on tutorial. Master the high-level architecture of modern data lakehouses and understand Apache Iceberg fundamentals before diving into practical implementation. Set up a distributed Trino cluster with master-worker architecture, implement data orchestration using Apache Airflow, and create data transformations following the Medallion Architecture pattern with DBT. Integrate object storage with MinIO, manage versioned metadata through Project Nessie, and optimize Trino query performance for production environments. Explore distributed systems concepts, data pipeline orchestration techniques, and modern data lakehouse best practices through real-world implementation. Work with DataGrip for query execution and analysis while gaining practical insights into query optimization and performance tuning strategies for large-scale data processing systems.

Syllabus

0:00 Introduction
1:02 High Level System Architecture Walkthrough
12:00 What is Apache Iceberg?
24:00 Setting up Distributed Data Lakehouse from Scratch
40:15 Apache Airflow DAG Pipeline
1:46:10 DBT Project Setup with Medallion Architecture
1:30:55 Trino Cluster Optimisation
1:32:37 Trino Query Engine with DataGrip
1:41:17 Results and Discussion
1:55:00 Outro

Taught by

CodeWithYu

Reviews

Start your review of Building Distributed Modern Data Lakehouse From Scratch with Apache Iceberg - An End to End Project

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.