Stuck in Tutorial Hell? Learn Backend Dev the Right Way
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore strategies for optimizing resource utilization in Kubernetes clusters by co-locating CPU and GPU workloads. Learn how Ant Financial and Alibaba achieved a 10% increase in utilization through innovative approaches. Discover the creation of a new QoS class, implementation of node-level cgroups for batch jobs, and use of PodGroup CRD for gang scheduling. Gain insights into building and managing a co-location cluster with over 100 GPU and 500 CPU nodes, effectively combining long-running services and AI batch jobs. This 37-minute conference talk from the Linux Foundation provides valuable experience and practices for maximizing resource efficiency in Kubernetes environments.
Syllabus
Co-Location of CPU and GPU Workloads with High Resource Efficiency - Penghao Cen & Jian He
Taught by
Linux Foundation