Increasing GPU Utilization on Kubernetes Clusters for AI/ML Workloads
CNCF [Cloud Native Computing Foundation] via YouTube
Master Windows Internals - Kernel Programming, Debugging & Architecture
Master Agentic AI, GANs, Fine-Tuning & LLM Apps
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore strategies for optimizing GPU utilization in large-scale Kubernetes clusters dedicated to AI/ML workloads in this informative conference talk. Learn how to maximize the efficiency of 10,000 A100 GPUs across 20 on-premises Kubernetes clusters through various open-source solutions. Discover hardware-level optimizations like NVIDIA MIG, scheduler improvements with Volcano, application-layer enhancements using PaddlePaddle for smarter training job distribution, and multi-cluster management with Armada. Gain valuable insights into pitfalls, best practices, and recommendations based on real-world experiences from four large-scale projects completed in Q4 2023. Enhance your understanding of complex GPU optimization setups and their practical implementation in AI/ML environments.
Syllabus
Increasing GPU Utilisation on K8s Clusters Dedicated for AI/ML Workloads
Taught by
CNCF [Cloud Native Computing Foundation]