New Pattern for Sailing Multi-host LLM Inference
CNCF [Cloud Native Computing Foundation] via YouTube
Learn Backend Development Part-Time, Online
AI Adoption - Drive Business Value and Organizational Impact
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a conference talk that introduces LeaderWorkerSet (LWS), a specialized Kubernetes project designed to address the challenges of distributed large language model inference across multiple hosts. Learn how this solution, developed under Kubernetes SIG-Apps and Serving Working Group guidance, tackles the complexity of serving massive foundation models like Llama 3.1-405B and DeepSeek R1 that cannot fit on single nodes. Discover LWS's key features including dual-template architecture for different Pod types, fine-grained rolling update strategies, topology management, and all-or-nothing failure handling mechanisms. Examine real-world adoption practices from industry leaders including NVIDIA and Google, and see practical demonstrations of LWS integration with popular inference engines such as vLLM and SGLang. Gain insights into how this cloud-native approach simplifies the deployment and management of distributed inference workloads while maintaining reliability and scalability in production environments.
Syllabus
New Pattern for Sailing Multi-host LLM Inference - Kante Yin, DaoCloud
Taught by
CNCF [Cloud Native Computing Foundation]