Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore the latest developments in Envoy AI Gateway, an open source project specifically designed for serving GenAI workloads in Kubernetes environments, in this 37-minute conference talk from CNCF. Learn about the rapidly evolving features including transparent failover between models and model providers, token-based global rate limiting, and integration with external load balancing policies that enable efficient utilization of inference resources. Discover improvements in buffer size management for retries to textual LLMs and support for disaggregated serving in llm-d. Gain insights into the use, configuration, and inner workings of these new features as presented by Yan Avlasov from Google and Takeshi Yoneda from Tetrate.io, providing practical knowledge for implementing AI gateway solutions in cloud native environments.
Syllabus
Evolution of Envoy AI Gateway - Yan Avlasov, Google & Takeshi Yoneda, Tetrate.io
Taught by
CNCF [Cloud Native Computing Foundation]