Methodology and Observation of Congestion Control Impact on MoE Training Job Completion Time
Open Compute Project via YouTube
Python, Prompt Engineering, Data Science — Build the Skills Employers Want Now
Learn Excel & Financial Modeling the Way Finance Teams Actually Use Them
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn methodology and emulation techniques to quantify AI network fabric performance for GPU clusters running Mixture-of-Experts training workloads in this 15-minute conference presentation. Explore how congestion control and load balancing schemas impact training job completion times in AI data centers through systematic observation and comparison. Discover practical approaches to measuring network effectiveness that interconnects GPU clusters, with focus on real-world implementations that bridge theoretical insights with actionable strategies for advancing AI network infrastructure research and applications.
Syllabus
Methodology and Observation of Congestion Control Impact on MoE Training Job Completion Time
Taught by
Open Compute Project