Methodology and Observation of Congestion Control Impact on MoE Training Job Completion Time
Open Compute Project via YouTube
Live Online Classes in Design, Coding & AI — Small Classes, Free Retakes
Pass the PMP® Exam on Your First Try — Expert-Led Training
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn methodology and emulation techniques to quantify AI network fabric performance for GPU clusters running Mixture-of-Experts training workloads in this 15-minute conference presentation. Explore how congestion control and load balancing schemas impact training job completion times in AI data centers through systematic observation and comparison. Discover practical approaches to measuring network effectiveness that interconnects GPU clusters, with focus on real-world implementations that bridge theoretical insights with actionable strategies for advancing AI network infrastructure research and applications.
Syllabus
Methodology and Observation of Congestion Control Impact on MoE Training Job Completion Time
Taught by
Open Compute Project