Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about ZEN, a novel gradient synchronization system that leverages tensor sparsity to accelerate distributed deep learning training in this 17-minute conference presentation from OSDI '25. Discover how researchers from Rice University and Stevens Institute of Technology analyzed sparse tensor characteristics in popular models to understand sparsity fundamentals and systematically explored communication scheme design spaces to identify optimal approaches. Explore the development of ZEN's holistic gradient synchronization system that addresses the communication bottleneck in distributed training by fully leveraging high tensor sparsity commonly observed in deep learning models. Understand how ZEN achieves significant performance improvements, delivering up to 5.09x speedup in communication time and up to 2.48x speedup in training throughput compared to existing state-of-the-art methods, making distributed training more efficient for scaling deep learning model training across multiple GPUs.
Syllabus
OSDI '25 - ZEN: Empowering Distributed Training with Sparsity-driven Data Synchronization
Taught by
USENIX