Accelerating Mixture of Experts Training With Rail Optimized InfiniBand Networking in Crusoe Cloud
AI Engineer via YouTube
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Finance Certifications Goldman Sachs & Amazon Teams Trust
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to accelerate mixture of experts training using rail-optimized InfiniBand networking infrastructure in this 18-minute conference talk from the AI Engineer World's Fair. Discover the networking challenges that arise when training state-of-the-art machine learning models that use mixture of experts techniques, which distribute model layers across multiple neural networks to enable more efficient training of larger-scale models. Explore Crusoe Cloud's high-performance InfiniBand network architecture specifically designed to handle the sparse distribution of model state that puts increasing pressure on cluster-level networking during training. Understand the "rail-optimized" design approach that reduces hops between GPU sets in clusters, accelerates all-to-all performance, and ultimately reduces training time. Gain insights into utilizing these specialized networking solutions to optimize your own training workloads and improve the efficiency of large-scale machine learning model training.
Syllabus
Accelerating Mixture of Experts Training With Rail Optimized InfiniBand Networking in Crusoe Cloud
Taught by
AI Engineer