Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

LLM-Aware Load Balancing in Kubernetes: A New Era of Efficiency

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Keynote presentation exploring how traditional load balancing methods fail when applied to Large Language Model (LLM) serving in Kubernetes environments. Discover new Kubernetes APIs specifically designed for LLM workload routing that accommodate the unique computational challenges posed by varying prompt lengths, model differences, and autoregressive processing. Learn how these APIs integrate with Gateway API and can be implemented across various Gateway API implementations for streamlined LLM routing support. The speakers demonstrate the practical application of this project through real-world examples, showcasing significant efficiency improvements. This 16-minute talk from KubeCon + CloudNativeCon Europe features Clayton Coleman, Distinguished Engineer at Google, and Jiaxin Shan, Software Engineer at Bytedance, as they introduce a new era of cloud-native LLM infrastructure management.

Syllabus

Keynote: LLM-Aware Load Balancing in Kubernetes: A New Era of Effici... Clayton Coleman, Jiaxin Shan

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of LLM-Aware Load Balancing in Kubernetes: A New Era of Efficiency

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.