Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

AutoCCL - Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training

USENIX via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about AutoCCL, an automated tuning method for optimizing collective communication libraries in distributed deep neural network training through this 15-minute conference presentation from NSDI '25. Discover how researchers from the University of Science and Technology of China and Microsoft Research address the critical challenge of parameter selection in communication libraries that are often overlooked in network optimizations. Explore the innovative divide-and-conquer algorithm that tackles state explosion problems in configuration search spaces by decoupling implementation-related parameters from search-sensitive ones. Understand the online tuning approach that accounts for communication-computation interference while hiding tuning overhead within early training iterations. Examine the implementation built on top of NVIDIA's NCCL library and review comprehensive evaluation results showing 1.24-1.29× speedups on microbenchmarks compared to NCCL, up to 1.80× improvements with concurrent computation, and 1.07-1.32× enhancements in per-iteration training time for large language models and vision models across multi-node GPU clusters with various interconnect configurations.

Syllabus

NSDI '25 - AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and...

Taught by

USENIX

Reviews

Start your review of AutoCCL - Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.