Earn Your CS Degree, Tuition-Free, 100% Online!
Google AI Professional Certificate - Learn AI Skills That Get You Hired
Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore the evolution from traditional stateless microservices to modern GenAI platforms in this conference talk that addresses the challenges of serving AI inference workloads at scale. Learn how the shift from simple REST APIs to streaming tokens, prompt orchestration, and GPU-aware routing is exposing the limitations of traditional gateways and requiring new architectural approaches. Discover a real-world reference architecture built with open-source tools including Envoy AI Gateway and KServe that supports dynamic model-based routing, token-level rate limiting, secure upstream authentication, comprehensive observability, and multi-provider failover capabilities. Understand why these features have become essential requirements rather than optional enhancements for reliable AI inference systems. Gain practical insights into routing, serving, and monitoring LLM traffic while exploring how current CNCF tools are adapting to meet the demands of the GenAI era, leaving you with a concrete blueprint for implementing scalable AI inference infrastructure.
Syllabus
Inference Awakens: Tools for the Age of GenAI - Alexa Griffith, Bloomberg & Erica Hughberg, Tetrate
Taught by
CNCF [Cloud Native Computing Foundation]