Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Netflix's Journey to an Apache Iceberg Data Lake - From Hive to Exabyte Scale

AWS Events via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Learn about Netflix's large-scale data migration journey in this AWS re:Invent 2023 conference session that details the transition from Apache Hive to Apache Iceberg for managing their one-exabyte data lake. Discover the technical challenges and solutions involved in migrating 300 petabytes of data, including the development of custom tooling, secure Iceberg tables, and an Iceberg REST catalog. Explore Netflix's Big Data Platform Architecture, metadata services, table management services, and innovative features like Autotune and Autolift. Gain insights into the migration strategy that minimized data movement, the state machine implementation for migration tooling, and solutions for user friction issues. Understand the benefits of Apache Iceberg's capabilities such as time travel and schema evolution, and learn how Netflix successfully executed this massive data warehouse transformation while maintaining operational efficiency.

Syllabus

Introduction
Agenda
Big Data Platform Architecture
Netflixs history with Hive
Metadata services
Table management services
Autotune
Autolift
Secure Iceberg tables
Iceberg access model
Key objectives
Minimize overall data movement
Hive to Iceberg migration tooling
Migration tooling auxiliary services
Migration tooling state machine
Benefits of migration tooling
Instantaneous revert operation
User fiction issues
Conclusion
Open sourcing
Thank you

Taught by

AWS Events

Reviews

Start your review of Netflix's Journey to an Apache Iceberg Data Lake - From Hive to Exabyte Scale

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.