Optimizing AI Workload Scheduling - Bilibili's Journey to an Efficient Cloud Native AI Platform
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore how China's leading video platform Bilibili tackles complex AI workload scheduling challenges across multiple Kubernetes clusters in this keynote presentation. Learn about the four critical challenges faced in multi-cluster AI workload management: workload diversity with different scheduling requirements for training, inference, and video processing; cross-cluster complexity when managing workloads across expanding data centers with SLAs; performance demands requiring minimal startup latency and optimal scheduling efficiency for short-running tasks; and balancing efficiency with quality of service while maximizing resource utilization. Discover specific optimization techniques including leveraging and enhancing CNCF projects like Karmada and Volcano to build a unified, high-performance AI workload scheduling platform, integrating technologies such as KubeRay for scheduling various AI online and offline workloads, and maximizing resource efficiency through online and offline hybrid scheduling, tidal scheduling, and other advanced technologies. Gain insights from real-world experiences in building an efficient cloud native AI platform that handles diverse workload requirements while maintaining performance and reliability standards.
Syllabus
Keynote: Optimizing AI Workload Scheduling: Bilibili's Journey To an Effici... Long Xu & Kevin Wang
Taught by
CNCF [Cloud Native Computing Foundation]