Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

A Unified Solution for Data Management and Model Training With Apache Iceberg and Mosaic Streaming

Databricks via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore ByteDance's innovative approach to solving data management and model training challenges through Magnus (enhanced Apache Iceberg) and Byted Streaming (customized Mosaic Streaming) in this 32-minute conference talk. Learn how ByteDance leveraged Iceberg's branch/tag functionality to efficiently manage massive datasets and checkpoints, while implementing enhanced metadata and a custom C++ data reader to achieve optimal sharding, shuffling, and data loading performance. Discover the flexible table migration capabilities, detailed metrics, and built-in full-text indexes on Iceberg tables that ensure training reliability. Understand how the team addressed scalability and performance issues with ultra-large datasets by customizing Mosaic Streaming to resolve challenges including slow startup times, high resource consumption, and limited data source compatibility. Gain insights into the technical enhancements made to both Magnus and Byted Streaming, and see demonstrations of how these solutions enable efficient and robust distributed training at scale.

Syllabus

A Unified Solution for Data Management and Model Training With Apache Iceberg and Mosaic Streaming

Taught by

Databricks

Reviews

Start your review of A Unified Solution for Data Management and Model Training With Apache Iceberg and Mosaic Streaming

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.