Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

AI Backend - Deploying SRv6 uSID and SONiC for Deterministic Load Balancing

NANOG via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a technical deep-dive conference talk examining Microsoft's deployment of SRv6 technology to address critical networking challenges in large-scale AI training clusters. Learn how the synchronized nature of AI workloads creates massive, bursty elephant flows that break traditional data center designs, causing ECMP hashing collisions, congestion, and significant job completion delays. Discover the specific traffic characterization of large-scale training jobs and compare NIC-based versus switch-based load balancing techniques. Understand Microsoft's strategic shift to deterministic multipathing using Source Routing (SRv6) to ensure conflict-free traffic placement in AI backend networks. Gain practical insights into the real-world implementation of SRv6 uSID within the SONiC network operating system, including operational data on deployment, monitoring, and troubleshooting this architecture in production environments. Benefit from the expertise of Pablo Camarillo, Principal Engineer at Cisco and lead architect of SRv6 technology, who shares lessons learned from implementing this solution in one of the world's largest AI infrastructures, moving beyond theoretical concepts to practical application in hyperscale network fabrics.

Syllabus

AI Backend: Deploying SRv6 uSID and SONiC for Deterministic Load Balancing

Taught by

NANOG

Reviews

Start your review of AI Backend - Deploying SRv6 uSID and SONiC for Deterministic Load Balancing

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.