Build the Finance Skills That Lead to Promotions — Not Just Certificates
Launch a New Career with Certificates from Google, IBM & Microsoft
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore advanced routing strategies for stateful AI workloads in Kubernetes through this 30-minute conference talk from CNCF. Learn why traditional Kubernetes routing approaches fall short for modern generative AI applications that require context-aware routing to maximize performance and reduce costs. Discover layered routing strategies ranging from basic round-robin to sophisticated KV-Cache-aware load balancing, understanding when to apply each approach and their performance implications. Gain insights from the speakers' experience developing llm-d, a framework utilizing the Kubernetes Gateway API Inference Extension through collaboration between Google, IBM Research, and Red Hat. Master routing patterns for long-context and sessionful traffic, implement global cache indices and local offloading for intelligent routing decisions, and examine benchmarks demonstrating improvements in latency, cache hit rates, and GPU utilization. Understand practical methods for adopting cache-aware routing without requiring major infrastructure changes, making this essential viewing for anyone scaling multi-turn, agentic, or LLM-powered workloads in Kubernetes environments.
Syllabus
Routing Stateful AI Workloads in Kubernetes - Maroon Ayoub, IBM & Michey Mehta, Red Hat
Taught by
CNCF [Cloud Native Computing Foundation]