Envoy Proxy: Evolved for Serving LLMs
CNCF [Cloud Native Computing Foundation] via YouTube
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
The Private Equity Associate Certification
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
This conference talk explores how Envoy proxy has been enhanced to address the unique challenges of deploying Large Language Models (LLMs) efficiently in production environments. Learn about the specific challenges of deploying and scaling LLMs in production and discover how Envoy's latest features optimize LLM serving, improve performance, and simplify integration into Kubernetes-native architectures. The speakers from Google delve into advanced load balancing techniques for LLM inference that intelligently route requests to optimize resource utilization and minimize latency, explain how Envoy can be instrumented for compatibility with popular LLM serving specifications such as OpenAI API specifications, and discuss security considerations for LLMs, including how to attach AI Safety frameworks in the Envoy proxy dataplane.
Syllabus
Envoy Proxy: Evolved for Serving LLMs - Vaibhav Katkade & Andres Guedez, Google
Taught by
CNCF [Cloud Native Computing Foundation]