Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Enabling Efficient GPU Communication over Multiple NICs with FuseLink

USENIX via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to overcome GPU communication bottlenecks in machine learning clusters through this 17-minute conference talk from OSDI '25. Discover FuseLink, a novel system that addresses the limitations of static GPU-NIC bindings that prevent full utilization of multiple network interface cards in ML servers. Explore how traditional systems create bottlenecks at hot-spot NICs when handling imbalanced communication patterns in tasks like large language model serving, expert-parallel training, and recommendation model training. Understand FuseLink's innovative approach of extending inter-server networks by integrating high-speed intra-server connections and leveraging GPUs to efficiently relay traffic to idle NICs. Examine the seamless integration with NCCL that allows ML applications to benefit without code modifications, and review performance results showing up to 212GBps bandwidth between inter-server GPUs, 1.04-2.73× reduction in LLM first-token generation latencies, 1.3× improvement in mixture-of-experts model training throughput, and 1.2× acceleration in deep learning recommendation model training.

Syllabus

OSDI '25 - Enabling Efficient GPU Communication over Multiple NICs with FuseLink

Taught by

USENIX

Reviews

Start your review of Enabling Efficient GPU Communication over Multiple NICs with FuseLink

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.