Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Sailing Multi-host Inference for LLM on Kubernetes

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to deploy distributed inference for large language models on Kubernetes using LeaderWorkerSet (LWS) and vLLM in this conference talk. Explore the challenges of serving large foundation models like Llama3.1-405B or DeepSeek R1 that cannot fit into a single node, requiring distributed inference with model parallelism. Discover LeaderWorkerSet, a dedicated multi-host inference project developed under Kubernetes SIG-Apps and Serving Working Group guidance, which addresses these complexities through features including dual-template support for different Pod types, fine-grained rolling update strategies, topology management, and all-or-nothing failure handling. See practical demonstrations of deploying distributed inference workloads using the popular vLLM inference engine, known for its performance and ease of use, integrated with LWS on Kubernetes infrastructure. Gain insights into solving the increasingly prevalent and vital inference workload challenges in the cloud native ecosystem.

Syllabus

Sailing Multi-host Inference for LLM on Kubernetes - Kay Yan, DaoCloud

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Sailing Multi-host Inference for LLM on Kubernetes

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.