AI Models Are Huge, but Your GPUs Aren't - Mastering Multi-Node Distributed Inference on Kubernetes
CNCF [Cloud Native Computing Foundation] via YouTube
The Investment Banker Certification
Master AI and Machine Learning: From Neural Networks to Applications
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn to deploy massive AI models exceeding 600B parameters for inference using Kubernetes in this conference talk from CNCF. Explore production-ready strategies for handling infrastructure challenges when your AI models outgrow single-GPU capabilities, covering day 0/1 operations with focus on latency, cost, and accuracy tradeoffs. Discover how to select between full-precision and quantized models, size worker nodes for optimal GPU, memory, and networking performance, and manage model parallelism effectively. Master Kubernetes-native challenges including topology-aware scheduling, GPU-NIC binding, and orchestrating inference phases with custom controllers. Examine traffic routing strategies and adaptive approaches to balance cost and performance at scale. Understand Prefill/Decode disaggregation techniques in both static and pooled modes to support varied prompt lengths. Gain practical insights from real-world benchmarks and production experience, walking away with actionable diagrams, checklists, and manifests for confident deployment of distributed AI inference workloads on Kubernetes.
Syllabus
AI Models Are Huge, but Your GPUs Aren’t: Mastering Multi-Node Distributed Infe... E. Wong & J. Shan
Taught by
CNCF [Cloud Native Computing Foundation]