Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a 20-minute conference presentation from USENIX ATC '25 that introduces PopFetcher, an innovative system designed to accelerate Mixture-of-Experts (MoE) model training through popularity-based expert prefetching. Learn how researchers from multiple universities tackled the critical bottlenecks in MoE training, where sparse expert activation creates substantial All-to-All communication overhead and imbalanced computation workloads that severely impact training efficiency. Discover how PopFetcher leverages skewed and correlated patterns in expert selection by implementing a lightweight sliding-window technique to predict expert popularity accurately. Understand the system's dynamic approach to identifying high-demand experts and prefetching them during non-MoE computations, effectively utilizing idle network links to reduce token dispatching in subsequent All-to-All communications. Examine the rigorous mathematical formulation of end-to-end training latency and the tailored pruning strategy that derives globally optimal prefetching schemes to restore both communication and computation balance based on underlying network infrastructure. Gain insights into how prioritizing All-to-All data streams during backward passes significantly alleviates communication blockage, with extensive GPU cluster experiments demonstrating 15%-94.5% training time reductions compared to existing state-of-the-art systems.
Syllabus
USENIX ATC '25 - PopFetcher: Towards Accelerated Mixture-of-Experts Training Via Popularity...
Taught by
USENIX