Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

LLM-Aware Load Balancing in Kubernetes: A New Era of Efficiency

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Keynote presentation exploring how traditional load balancing methods fail for LLM workloads due to the variable computational demands of different prompts, model differences, and autoregressive processing. Learn about new Kubernetes APIs specifically designed for routing LLM workloads, allowing configuration of serving objectives and priorities for different use cases. The speakers demonstrate how these APIs integrate with Gateway API and can be implemented across various Gateway API implementations for turnkey LLM routing support. Through real-world examples, see the significant efficiency improvements possible with LLM-aware load balancing strategies, especially important as model multiplexing techniques like LoRA introduce additional complexity to the serving landscape.

Syllabus

Keynote: LLM-Aware Load Balancing in Kubernetes: A New Era of Efficiency- C. Coleman & J. Shan (ISL)

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of LLM-Aware Load Balancing in Kubernetes: A New Era of Efficiency

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.