Get 20% off all career paths from fullstack to AI
Become an AI & ML Engineer with Cal Poly EPaCE — IBM-Certified Training
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore the innovative Exoshuffle system for large-scale data processing in this 26-minute conference talk from Anyscale. Delve into the world of shuffle, a crucial primitive in data processing applications, and discover how Exoshuffle challenges conventional wisdom by implementing high-performance, reliable shuffle on Ray, a general-purpose distributed computing system. Learn how Exoshuffle outperforms Spark and achieves an impressive 82% of theoretical performance on a 100TB sort using 100 nodes. Gain insights into the integration of Exoshuffle with Ray 2.0's Datasets library, providing enhanced large-scale shuffle capabilities for machine learning users. This talk offers valuable knowledge for data scientists, engineers, and anyone interested in advancing large-scale data processing techniques.
Syllabus
Large-scale data shuffle in Ray with Exoshuffle
Taught by
Anyscale