Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Enabling Technologies for Next Generation Large Scale AI Backend Networking

Open Compute Project via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about Microsoft's next-generation backend AI network architecture designed for massive GPU clusters in this conference talk from the Open Compute Project. Discover how AI acceleration has transformed networking infrastructure requirements, driving the evolution from tens of thousands to hundreds of thousands of GPUs in backend clusters. Explore the distinct networking challenges these hyper-scale environments present compared to traditional data centers, including demands for ultra-low latency, high throughput, and proactive fault detection capabilities. Examine three key technologies deployed in Microsoft's data centers: Segment Routing over IPv6 (SRv6) for advanced traffic engineering, High-Frequency Streaming Telemetry (HFST) for real-time network monitoring, and trimming techniques for optimized performance. Understand the implementation details and driving factors behind each technology, while gaining insights into how SAI (Switch Abstraction Interface) and SONiC (Software for Open Networking in the Cloud) support the deployment of these hyper-scale AI backend networks. Gain valuable perspectives on the future of networking infrastructure as AI workloads continue to scale exponentially.

Syllabus

Enabling Technologies for Next Generation Large Scale AI Backend Networking

Taught by

Open Compute Project

Reviews

Start your review of Enabling Technologies for Next Generation Large Scale AI Backend Networking

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.