Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Routing Stateful AI Workloads in Kubernetes

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore advanced routing strategies for stateful AI workloads in Kubernetes through this 30-minute conference talk from CNCF. Learn why traditional Kubernetes routing approaches fall short for modern generative AI applications that require context-aware routing to maximize performance and reduce costs. Discover layered routing strategies ranging from basic round-robin to sophisticated KV-Cache-aware load balancing, understanding when to apply each approach and their performance implications. Gain insights from the speakers' experience developing llm-d, a framework utilizing the Kubernetes Gateway API Inference Extension through collaboration between Google, IBM Research, and Red Hat. Master routing patterns for long-context and sessionful traffic, implement global cache indices and local offloading for intelligent routing decisions, and examine benchmarks demonstrating improvements in latency, cache hit rates, and GPU utilization. Understand practical methods for adopting cache-aware routing without requiring major infrastructure changes, making this essential viewing for anyone scaling multi-turn, agentic, or LLM-powered workloads in Kubernetes environments.

Syllabus

Routing Stateful AI Workloads in Kubernetes - Maroon Ayoub, IBM & Michey Mehta, Red Hat

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Routing Stateful AI Workloads in Kubernetes

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.