Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Scale Up and Scale Out AI Fabrics - A Polymorphic Ethernet Architecture for Systems of Systems

Open Compute Project via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the technical challenges and solutions for designing AI fabrics that can handle both scale-up and scale-out requirements in this 22-minute conference presentation by Jai Kumar, Distinguished Engineer at Broadcom. Learn how to address the competing demands of inference workloads that require low-latency, efficient bandwidth for unified GPU memory domains versus training workloads for large language models that need multi-tiered architectures managing distributed GPU domains. Discover how Ethernet technology can be leveraged to create a polymorphic architecture that converges these conflicting requirements into a robust system of systems. Examine key technical considerations including memory versus network semantics, protocol overhead optimization, latency management, fabric topology design, and congestion control algorithms, while understanding how to address challenges like incast-outcast patterns, multipathing, and remote memory access in distributed AI computing environments.

Syllabus

Scale Up and Scale Out AI Fabrics A Polymorphic Ethernet Architecture for Systems of System

Taught by

Open Compute Project

Reviews

Start your review of Scale Up and Scale Out AI Fabrics - A Polymorphic Ethernet Architecture for Systems of Systems

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.