Accelerating Mixture of Experts Training With Rail Optimized InfiniBand Networking in Crusoe Cloud
AI Engineer via YouTube
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to accelerate mixture of experts training using rail-optimized InfiniBand networking infrastructure in this 18-minute conference talk from the AI Engineer World's Fair. Discover the networking challenges that arise when training state-of-the-art machine learning models that use mixture of experts techniques, which distribute model layers across multiple neural networks to enable more efficient training of larger-scale models. Explore Crusoe Cloud's high-performance InfiniBand network architecture specifically designed to handle the sparse distribution of model state that puts increasing pressure on cluster-level networking during training. Understand the "rail-optimized" design approach that reduces hops between GPU sets in clusters, accelerates all-to-all performance, and ultimately reduces training time. Gain insights into utilizing these specialized networking solutions to optimize your own training workloads and improve the efficiency of large-scale machine learning model training.
Syllabus
Accelerating Mixture of Experts Training With Rail Optimized InfiniBand Networking in Crusoe Cloud
Taught by
AI Engineer