Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Introduction to Cisco AI Cluster Networking Design

Tech Field Day via YouTube

Start learning Write review

Learn the fundamentals of AI cluster networking design in this technical presentation that explores the diverse landscape of AI infrastructure deployment across different organizational scales. Discover how AI adoption varies from hyperscalers managing hundreds of thousands of GPUs to enterprises beginning with just a few hundred units, and understand the three primary scaling categories: scale-up within servers, scale-out across servers, and scale-across between data centers. Examine how different use cases, from foundational model training to fine-tuning and inferencing, require tailored networking solutions that account for varying R&D budgets and technical capabilities. Explore the architecture of modern AI clusters, focusing on GPU servers capable of generating 6.4 terabits of line-rate traffic per server and the multiple distinct networks they require. Understand the recent shift in networking best practices where front-end and storage networks are increasingly converged to achieve cost savings, while maintaining separate inter-GPU backend networks dedicated to GPU-to-GPU communication for distributed jobs. Follow a comprehensive end-to-end traffic flow analysis that demonstrates how user requests traverse standard data center fabrics, interact with applications and centralized services, before reaching AI cluster front-end networks and potentially accessing various storage systems or triggering inter-GPU backend communications. Gain insights into why solving AI networking challenges requires innovations at every network entry and exit point, not just in inter-GPU backend communications.