SRv6 for AI Backend Networks - Enabling Continental-Scale GPU Clusters
Open Compute Project via YouTube
Google AI Professional Certificate - Learn AI Skills That Get You Hired
Get 50% Off Udacity Nanodegrees — Code CC50
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how Microsoft engineers innovatively apply Segment Routing over IPv6 (SRv6) technology to overcome routing challenges in Ethernet-based AI backend networks in this 23-minute conference presentation. Discover why traditional BGP+ECMP schemes fall short of meeting the unprecedented communication requirements of AI training jobs and explore how SRv6, originally designed for wide-area network traffic engineering, provides fine-grained network path control in AI backend environments. Understand the methodology for implementing SRv6 to maximize network utilization, deliver excellent fabric resiliency, and enable continental-scale GPU AI clusters. Gain insights from Microsoft's Software Engineer II Changrong Wu and Principal Software Engineer Abhishek Dosi as they demonstrate practical applications of this innovative approach to AI infrastructure networking challenges.
Syllabus
SRv6 for AI Backend
Taught by
Open Compute Project