Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

More Than Model Sharding - LWS and Distributed Inference

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about LeaderWorkerSet (LWS), a Kubernetes solution designed to address the complex challenges of distributed inference for large language models beyond simple model sharding. Discover how native Kubernetes falls short when handling multi-node workloads like Llama3.1-405B or Deepseek-V3 (671B) that require distributed inference across multiple nodes using frameworks like vLLM with Ray backend. Explore the key challenges including standalone StatefulSets without coordination, gang-scheduling demands, uncontrolled startup order between master and workers causing boot lag, HPA limitations that scale individual StatefulSets rather than the entire group, stable index and rank requirements, topology-aware grouping needs, and failure recovery issues where single pod or GPU failures can disrupt overall inference. Understand how LWS addresses these problems through optimized resource coordination with leader-worker sets, improved performance through co-location strategies, integrated scaling with HPA for the entire LWS group, and all-or-nothing restart policies for fault tolerance as a cohesive unit.

Syllabus

More Than Model Sharding: LWS & Distributed Inference - Peter Pan, Nicole Li & Shane Wang

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of More Than Model Sharding - LWS and Distributed Inference

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.