Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Navigating the Rapid Evolution of Large Model Inference - Where Does Kubernetes Fit?

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore the complex intersection of large language model inference and Kubernetes infrastructure in this 30-minute conference talk from CNCF. Learn from Working Group Serving chairs and industry leaders from Bytedance, Red Hat, Google, and Microsoft as they address the critical decisions infrastructure teams face when deploying advanced LLM serving patterns. Discover how emerging techniques like model and expert parallelism, prefill/decode disaggregation, multi-LoRA implementations, and KV cache offloading challenge traditional serving architectures and push beyond conventional Kubernetes primitives. Gain practical frameworks for evaluating when to extend Kubernetes core functionality versus leveraging specialized runtimes and ecosystem projects. Understand the delicate balance between maintaining control and ensuring observability while adapting infrastructure to meet the rapidly evolving demands of large-scale LLM workloads. Acquire actionable insights for navigating the blurry boundaries between Kubernetes native capabilities, inference engines, and specialized tooling in the dynamic landscape of AI infrastructure.

Syllabus

Navigating the Rapid Evolution of Large Mod... Jiaxin Shan, Yuan Tang, Sergey Kanzhelev & Rita Zhang

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Navigating the Rapid Evolution of Large Model Inference - Where Does Kubernetes Fit?

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.