Building Custom GPU Clusters at Scale - Using Kubespray to Create High-Performance AI Infrastructure
CNCF [Cloud Native Computing Foundation] via YouTube
Coursera Plus Annual Nearly 45% Off
Power BI Fundamentals - Create visualizations and dashboards from scratch
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to build and deploy custom GPU clusters at scale using Kubespray for high-performance AI infrastructure in this 40-minute conference talk from CNCF. Discover how Kubespray, recognized by Kubernetes' SIG Cluster Lifecycle, enables deployment of production-ready Kubernetes clusters on bare metal with enhanced performance for AI applications through robust GPU support. Explore Kubespray's fundamentals, key features, and latest updates while gaining insights from real-world experiences of deploying custom GPU clusters at scale. Master the integration of essential Kubernetes technologies including LWS, Kueue, Gateway API Inference Extension, DRA, and tensor parallelism to optimize AI workloads such as RAG and LoRA applications. Understand how to improve resource utilization and performance while addressing the growing demands of AI workloads like Large Language Models that require scalable GPU infrastructure. Access practical knowledge about customizing AI clusters through Kubespray's inventory source code and learn to use Kubernetes operators for defining infrastructure in private cloud environments. Gain expertise in efficient cluster scaling techniques and overcome common challenges when building GPU-accelerated Kubernetes clusters for production AI workloads.
Syllabus
Building Custom GPU Clusters at Scale: Using Kubespray To Create High-Perfor... Kay Yan & Rong Zhang
Taught by
CNCF [Cloud Native Computing Foundation]