Unlocking Heterogeneous AI Infrastructure in Kubernetes Clusters - Leveraging HAMi
CNCF [Cloud Native Computing Foundation] via YouTube
Live Online Classes in Design, Coding & AI — Small Classes, Free Retakes
MIT Sloan AI Adoption: Build a Playbook That Drives Real Business ROI
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore the challenges and solutions for managing heterogeneous AI infrastructure in Kubernetes clusters through this informative conference talk. Delve into the HAMi project, designed to address the complexities of integrating diverse AI devices such as NVIDIA, Intel, and Huawei Ascend. Learn how to improve resource utilization, implement unified scheduling and observability, and enhance GPU sharing capabilities. Discover flexible scheduling strategies for GPUs, including NUMA affinity/anti-affinity and binpack/spread options. Gain insights into integrating HAMi with other projects like Volcano and scheduler-plugin. Examine real-world case studies from production-level users and discuss ongoing challenges and future roadmap for heterogeneous AI infrastructure management in Kubernetes environments.
Syllabus
Unlocking Heterogeneous AI Infrastructure K8s Cluster: Leveraging the Power of HAMi - Xiao Zhang
Taught by
CNCF [Cloud Native Computing Foundation]