Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

AI Models Are Huge, but Your GPUs Aren't - Mastering Multi-Node Distributed Inference on Kubernetes

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn to deploy massive AI models exceeding 600B parameters for inference using Kubernetes in this conference talk from CNCF. Explore production-ready strategies for handling infrastructure challenges when your AI models outgrow single-GPU capabilities, covering day 0/1 operations with focus on latency, cost, and accuracy tradeoffs. Discover how to select between full-precision and quantized models, size worker nodes for optimal GPU, memory, and networking performance, and manage model parallelism effectively. Master Kubernetes-native challenges including topology-aware scheduling, GPU-NIC binding, and orchestrating inference phases with custom controllers. Examine traffic routing strategies and adaptive approaches to balance cost and performance at scale. Understand Prefill/Decode disaggregation techniques in both static and pooled modes to support varied prompt lengths. Gain practical insights from real-world benchmarks and production experience, walking away with actionable diagrams, checklists, and manifests for confident deployment of distributed AI inference workloads on Kubernetes.

Syllabus

AI Models Are Huge, but Your GPUs Aren’t: Mastering Multi-Node Distributed Infe... E. Wong & J. Shan

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of AI Models Are Huge, but Your GPUs Aren't - Mastering Multi-Node Distributed Inference on Kubernetes

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.