KServe Next - Advancing Generative AI Model Serving

Explore the evolution of generative AI model serving infrastructure in this conference talk that traces the journey from custom deployment patterns to modern Kubernetes-native serving platforms. Discover the latest challenges in deploying and scaling large language models, including inference performance optimization, KV-cache management, distributed execution strategies, and cost optimization techniques. Learn about the groundbreaking KServe v0.17 release, which introduces enhanced support for generative AI workloads through a dedicated LLMInferenceService Custom Resource Definition designed specifically for LLM-serving capabilities such as disaggregated serving, advanced model and KV caching mechanisms, and seamless integration with the open source Envoy AI Gateway. Gain valuable insights into the cutting-edge technologies driving the next generation of AI applications and understand how to effectively prepare your infrastructure for the generative AI revolution, ensuring scalable, efficient, and interoperable model serving solutions.