High Performance Storage Solution for Large-scale ML Systems
CNCF [Cloud Native Computing Foundation] via YouTube
Launch a New Career with Certificates from Google, IBM & Microsoft
MIT Sloan AI Adoption: Build a Playbook That Drives Real Business ROI
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore a conference talk on developing high-performance storage solutions for large-scale machine learning systems. Discover how I/O bottlenecks can significantly impact training time and system scalability, especially when moving data from global filesystems. Learn about innovative approaches to address these challenges, including the adoption of high-speed hardware and software improvements such as thread models, load balancing SDKs, read/write splitting, and read path optimization. Gain insights into achieving lower latency and higher throughput for more efficient ML model training and data processing.
Syllabus
High Performance Storage Solution for Large-scale ML Systems - Hongjian Yu & Pengfei Zheng
Taught by
CNCF [Cloud Native Computing Foundation]