Kubernetes for Multi-Host Training and Inference - Workload Aware Scheduling
CNCF [Cloud Native Computing Foundation] via YouTube
Launch Your Cybersecurity Career in 6 Months
AI, Data Science & Cloud Certificates from Google, IBM & Meta
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn about advanced Kubernetes scheduling strategies specifically designed for AI/ML training and inference workloads in this 38-minute conference talk from CNCF. Discover how current pod-by-pod scheduling frameworks create challenges for AI/ML workloads that require tight coordination between pods, including difficulties with all-or-nothing scheduling and topologically aware compact placement during initial scheduling, failures, and preemptions. Explore innovative approaches being developed by SIG Scheduling to optimize Kubernetes for multi-host AI/ML workloads, understand the specific requirements and challenges these workloads present, and participate in discussions about design solutions that could make Kubernetes the premier platform for running distributed machine learning tasks.
Syllabus
Kubernetes for Multi-Host Training and Inference: Workload Aware Sc... Eric Tune & Dominik Marcinski
Taught by
CNCF [Cloud Native Computing Foundation]