Cloud Native Inference at Scale - Unlocking LLM Deployments with KServe
CNCF [Cloud Native Computing Foundation] via YouTube
AI, Data Science & Cloud Certificates from Google, IBM & Meta
AI Engineer - Learn how to integrate AI into software applications
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore how to deploy and scale large language models (LLMs) efficiently using KServe, an open-source Kubernetes-native serving platform designed to address the unique challenges of LLM inference. Learn about the complexities that differentiate LLM workloads from traditional machine learning models, including handling long prompts, token-by-token generation, bursty traffic patterns, and maintaining high GPU utilization. Discover KServe's approach to scalable model serving on Kubernetes with seamless integration that enables reproducible, resilient, and cost-efficient deployments. Understand how deterministic scheduling and token-aware request handling work through the Kubernetes inference scheduler using Gateway Inference Extension and various execution strategies. Examine distributed and disaggregated inferencing capabilities with LLM Inference Service for advanced serving scenarios, and gain insights into solving request routing, autoscaling, and scheduling challenges that are significantly more complex than typical model serving use cases.
Syllabus
CNCF On-Demand: Cloud Native Inference at Scale - Unlocking LLM Deployments with KServe
Taught by
CNCF [Cloud Native Computing Foundation]