KServe Next - Advancing Generative AI Model Serving
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the evolution of generative AI model serving infrastructure in this conference talk that traces the journey from custom deployment patterns to modern Kubernetes-native serving platforms. Discover the latest challenges in deploying and scaling large language models, including inference performance optimization, KV-cache management, distributed execution strategies, and cost optimization techniques. Learn about the groundbreaking KServe v0.17 release, which introduces enhanced support for generative AI workloads through a dedicated LLMInferenceService Custom Resource Definition designed specifically for LLM-serving capabilities such as disaggregated serving, advanced model and KV caching mechanisms, and seamless integration with the open source Envoy AI Gateway. Gain valuable insights into the cutting-edge technologies driving the next generation of AI applications and understand how to effectively prepare your infrastructure for the generative AI revolution, ensuring scalable, efficient, and interoperable model serving solutions.
Syllabus
KServe Next: Advancing Generative AI Model Serving - Yuan Tang, Red Hat & Dan Sun, Bloomberg
Taught by
CNCF [Cloud Native Computing Foundation]