Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the capabilities of Ray Data for fast, flexible, and scalable data loading in machine learning training pipelines through this 31-minute conference talk. Dive into performance comparisons between different open-source data loader solutions and discover how Ray Data matches PyTorch DataLoader and tf.data in single-node performance while offering advanced features for scale. Learn about in-memory streaming, automatic recovery from out-of-memory failures, and support for heterogeneous clusters. Gain insights into how Ray Data provides unmatched speed, scale, and flexibility compared to other open-source data loaders, addressing the growing complexity of data preprocessing requirements in diverse data types. Access the accompanying slide deck for a comprehensive overview of the presented concepts and techniques.
Syllabus
Fast, Flexible, and Scalable Data Loading for ML Training with Ray Data
Taught by
Anyscale