Get 50% Off Udacity Nanodegrees — Code CC50
Gain a Splash of New Skills - Coursera+ Annual Nearly 45% Off
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn the fundamentals of AI cluster networking design in this technical presentation that explores the diverse landscape of AI infrastructure deployment across different organizational scales. Discover how AI adoption varies from hyperscalers managing hundreds of thousands of GPUs to enterprises beginning with just a few hundred units, and understand the three primary scaling categories: scale-up within servers, scale-out across servers, and scale-across between data centers. Examine how different use cases, from foundational model training to fine-tuning and inferencing, require tailored networking solutions that account for varying R&D budgets and technical capabilities. Explore the architecture of modern AI clusters, focusing on GPU servers capable of generating 6.4 terabits of line-rate traffic per server and the multiple distinct networks they require. Understand the recent shift in networking best practices where front-end and storage networks are increasingly converged to achieve cost savings, while maintaining separate inter-GPU backend networks dedicated to GPU-to-GPU communication for distributed jobs. Follow a comprehensive end-to-end traffic flow analysis that demonstrates how user requests traverse standard data center fabrics, interact with applications and centralized services, before reaching AI cluster front-end networks and potentially accessing various storage systems or triggering inter-GPU backend communications. Gain insights into why solving AI networking challenges requires innovations at every network entry and exit point, not just in inter-GPU backend communications.
Syllabus
Introduction to Cisco AI Cluster Networking Design with Paresh Gupta
Taught by
Tech Field Day