Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Kubernetes for Multi-Host Training and Inference - Workload Aware Scheduling

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about advanced Kubernetes scheduling strategies specifically designed for AI/ML training and inference workloads in this 38-minute conference talk from CNCF. Discover how current pod-by-pod scheduling frameworks create challenges for AI/ML workloads that require tight coordination between pods, including difficulties with all-or-nothing scheduling and topologically aware compact placement during initial scheduling, failures, and preemptions. Explore innovative approaches being developed by SIG Scheduling to optimize Kubernetes for multi-host AI/ML workloads, understand the specific requirements and challenges these workloads present, and participate in discussions about design solutions that could make Kubernetes the premier platform for running distributed machine learning tasks.

Syllabus

Kubernetes for Multi-Host Training and Inference: Workload Aware Sc... Eric Tune & Dominik Marcinski

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Kubernetes for Multi-Host Training and Inference - Workload Aware Scheduling

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.