Kubernetes for Multi-Host Training and Inference - Workload Aware Scheduling
CNCF [Cloud Native Computing Foundation] via YouTube
AI Adoption - Drive Business Value and Organizational Impact
35% Off Finance Skills That Get You Hired - Code CFI35
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about advanced Kubernetes scheduling strategies specifically designed for AI/ML training and inference workloads in this 38-minute conference talk from CNCF. Discover how current pod-by-pod scheduling frameworks create challenges for AI/ML workloads that require tight coordination between pods, including difficulties with all-or-nothing scheduling and topologically aware compact placement during initial scheduling, failures, and preemptions. Explore innovative approaches being developed by SIG Scheduling to optimize Kubernetes for multi-host AI/ML workloads, understand the specific requirements and challenges these workloads present, and participate in discussions about design solutions that could make Kubernetes the premier platform for running distributed machine learning tasks.
Syllabus
Kubernetes for Multi-Host Training and Inference: Workload Aware Sc... Eric Tune & Dominik Marcinski
Taught by
CNCF [Cloud Native Computing Foundation]