Beyond Stock Outs - Scaling Inference on Mixed GPU Hardware With DRA
CNCF [Cloud Native Computing Foundation] via YouTube
Python, Prompt Engineering, Data Science — Build the Skills Employers Want Now
Get 20% off all career paths from fullstack to AI
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to optimize GPU resource allocation in Kubernetes environments using Dynamic Resource Allocation (DRA) to overcome hardware availability constraints and improve cost efficiency. Discover how DRA, which reached beta status in Kubernetes 1.32, enables flexible GPU allocation by allowing pods to utilize any available GPU that meets minimum specifications rather than being restricted to specific hardware types. Explore practical strategies for writing flexible resource specifications that can adapt to mixed GPU environments, enabling deployments to scale effectively even when preferred GPU types are unavailable. Understand how DRA integrates with advanced node autoscaling solutions like Google's Custom Compute Classes and Karpenter to automatically provision virtual machines with the most available or cost-effective GPU options. Master techniques for handling scenarios where different GPU hardware requires varying device counts while maintaining deployment consistency. Examine real-world cost savings and utilization improvements achieved through intelligent GPU resource management. Watch live demonstrations showing DRA implementation in action, including configuration examples and scaling scenarios that showcase how this approach transforms infrastructure reliability and economic efficiency in cloud-native GPU workloads.
Syllabus
Beyond Stock Outs: Scaling Inference on Mixed GPU Hardware With DRA - John Belamaric & Bo Fu, Google
Taught by
CNCF [Cloud Native Computing Foundation]