Building Custom GPU Clusters at Scale - Using Kubespray to Create High-Performance AI Infrastructure
CNCF [Cloud Native Computing Foundation] via YouTube
PowerBI Data Analyst - Create visualizations and dashboards from scratch
The Fastest Way to Become a Backend Developer Online
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn to build and deploy custom GPU clusters at scale using Kubespray for high-performance AI infrastructure in this 40-minute conference talk from CNCF. Discover how Kubespray, recognized by Kubernetes' SIG Cluster Lifecycle, enables deployment of production-ready Kubernetes clusters on bare metal with enhanced performance for AI applications through robust GPU support. Explore Kubespray's fundamentals, key features, and latest updates while gaining insights from real-world experiences of deploying custom GPU clusters at scale. Master the integration of essential Kubernetes technologies including LWS, Kueue, Gateway API Inference Extension, DRA, and tensor parallelism to optimize AI workloads such as RAG and LoRA applications. Understand how to improve resource utilization and performance while addressing the growing demands of AI workloads like Large Language Models that require scalable GPU infrastructure. Access practical knowledge about customizing AI clusters through Kubespray's inventory source code and learn to use Kubernetes operators for defining infrastructure in private cloud environments. Gain expertise in efficient cluster scaling techniques and overcome common challenges when building GPU-accelerated Kubernetes clusters for production AI workloads.
Syllabus
Building Custom GPU Clusters at Scale: Using Kubespray To Create High-Perfor... Kay Yan & Rong Zhang
Taught by
CNCF [Cloud Native Computing Foundation]