Optimizing AI Workload Scheduling - Bilibili's Journey to an Efficient Cloud Native AI Platform
CNCF [Cloud Native Computing Foundation] via YouTube
Learn Generative AI, Prompt Engineering, and LLMs for Free
Learn Backend Development Part-Time, Online
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore how China's leading video platform Bilibili tackles complex AI workload scheduling challenges across multiple Kubernetes clusters in this keynote presentation. Learn about the four critical challenges faced in multi-cluster AI workload management: workload diversity with different scheduling requirements for training, inference, and video processing; cross-cluster complexity when managing workloads across expanding data centers with SLAs; performance demands requiring minimal startup latency and optimal scheduling efficiency for short-running tasks; and balancing efficiency with quality of service while maximizing resource utilization. Discover specific optimization techniques including leveraging and enhancing CNCF projects like Karmada and Volcano to build a unified, high-performance AI workload scheduling platform, integrating technologies such as KubeRay for scheduling various AI online and offline workloads, and maximizing resource efficiency through online and offline hybrid scheduling, tidal scheduling, and other advanced technologies. Gain insights from real-world experiences in building an efficient cloud native AI platform that handles diverse workload requirements while maintaining performance and reliability standards.
Syllabus
Keynote: Optimizing AI Workload Scheduling: Bilibili's Journey To an Effici... Long Xu & Kevin Wang
Taught by
CNCF [Cloud Native Computing Foundation]