Increasing GPU Utilization on Kubernetes Clusters for AI/ML Workloads
CNCF [Cloud Native Computing Foundation] via YouTube
AI Adoption - Drive Business Value and Organizational Impact
Our career paths help you become job ready faster
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore strategies for optimizing GPU utilization in large-scale Kubernetes clusters dedicated to AI/ML workloads in this informative conference talk. Learn how to maximize the efficiency of 10,000 A100 GPUs across 20 on-premises Kubernetes clusters through various open-source solutions. Discover hardware-level optimizations like NVIDIA MIG, scheduler improvements with Volcano, application-layer enhancements using PaddlePaddle for smarter training job distribution, and multi-cluster management with Armada. Gain valuable insights into pitfalls, best practices, and recommendations based on real-world experiences from four large-scale projects completed in Q4 2023. Enhance your understanding of complex GPU optimization setups and their practical implementation in AI/ML environments.
Syllabus
Increasing GPU Utilisation on K8s Clusters Dedicated for AI/ML Workloads
Taught by
CNCF [Cloud Native Computing Foundation]