Beyond Stock Outs - Scaling Inference on Mixed GPU Hardware With DRA
CNCF [Cloud Native Computing Foundation] via YouTube
Get 35% Off CFI Certifications - Code CFI35
AI Engineer - Learn how to integrate AI into software applications
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to optimize GPU resource allocation in Kubernetes environments using Dynamic Resource Allocation (DRA) to overcome hardware availability constraints and improve cost efficiency. Discover how DRA, which reached beta status in Kubernetes 1.32, enables flexible GPU allocation by allowing pods to utilize any available GPU that meets minimum specifications rather than being restricted to specific hardware types. Explore practical strategies for writing flexible resource specifications that can adapt to mixed GPU environments, enabling deployments to scale effectively even when preferred GPU types are unavailable. Understand how DRA integrates with advanced node autoscaling solutions like Google's Custom Compute Classes and Karpenter to automatically provision virtual machines with the most available or cost-effective GPU options. Master techniques for handling scenarios where different GPU hardware requires varying device counts while maintaining deployment consistency. Examine real-world cost savings and utilization improvements achieved through intelligent GPU resource management. Watch live demonstrations showing DRA implementation in action, including configuration examples and scaling scenarios that showcase how this approach transforms infrastructure reliability and economic efficiency in cloud-native GPU workloads.
Syllabus
Beyond Stock Outs: Scaling Inference on Mixed GPU Hardware With DRA - John Belamaric & Bo Fu, Google
Taught by
CNCF [Cloud Native Computing Foundation]