Gain a Splash of New Skills - Coursera+ Annual Nearly 45% Off
Get 35% Off CFI Certifications - Code CFI35
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn advanced AI cluster networking solutions through a technical demonstration focusing on dynamic load balancing (DLB) for GPU-to-GPU fabric communications. Explore the critical challenge of congestion and load balancing in AI networking, where traditional ECMP load balancing fails due to random flow hashing that creates severe traffic imbalances with some links congested at 105% capacity while others remain idle at 60%. Understand how this non-uniform utilization triggers performance-killing pause frames and out-of-order packets that devastate tightly coupled collective communication jobs. Discover Cisco's advanced load-balancing techniques that move beyond simple ECMP, including a "flowlet" dynamic load balancing approach where switches detect inter-packet gaps to identify flowlets and route them through the least-congested links. Examine a fully validated, joint-reference architecture co-designed with NVIDIA that combines Cisco's per-packet DLB with NVIDIA's adaptive routing and direct data placement capabilities on the SuperNIC. Review video benchmarks demonstrating how this auto-negotiated handshake between switch and NIC improved application-level bus bandwidth by 35-40% and virtually eliminated pause frames in a 64-GPU cluster compared to standard ECMP. Understand how the P4-programmable architecture of Cisco's Silicon One ASIC enables new feature delivery without multi-year hardware respins, and learn about the foundational work being standardized by the Ultra Ethernet Consortium (UEC) to provide turnkey solutions that rival hyperscaler network performance.
Syllabus
Cisco AI Cluster Networking Operations DLB Demo with Paresh Gupta
Taught by
Tech Field Day