Kubernetes for Multi-Host Training and Inference - Workload Aware Scheduling
CNCF [Cloud Native Computing Foundation] via YouTube
Get 20% off all career paths from fullstack to AI
Google Data Analytics, IBM AI & Meta Marketing — All in One Subscription
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn about advanced Kubernetes scheduling strategies specifically designed for AI/ML training and inference workloads in this 38-minute conference talk from CNCF. Discover how current pod-by-pod scheduling frameworks create challenges for AI/ML workloads that require tight coordination between pods, including difficulties with all-or-nothing scheduling and topologically aware compact placement during initial scheduling, failures, and preemptions. Explore innovative approaches being developed by SIG Scheduling to optimize Kubernetes for multi-host AI/ML workloads, understand the specific requirements and challenges these workloads present, and participate in discussions about design solutions that could make Kubernetes the premier platform for running distributed machine learning tasks.
Syllabus
Kubernetes for Multi-Host Training and Inference: Workload Aware Sc... Eric Tune & Dominik Marcinski
Taught by
CNCF [Cloud Native Computing Foundation]