Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Cloud Native Inference at Scale - Unlocking LLM Deployments with KServe

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore how to deploy and scale large language models (LLMs) efficiently using KServe, an open-source Kubernetes-native serving platform designed to address the unique challenges of LLM inference. Learn about the complexities that differentiate LLM workloads from traditional machine learning models, including handling long prompts, token-by-token generation, bursty traffic patterns, and maintaining high GPU utilization. Discover KServe's approach to scalable model serving on Kubernetes with seamless integration that enables reproducible, resilient, and cost-efficient deployments. Understand how deterministic scheduling and token-aware request handling work through the Kubernetes inference scheduler using Gateway Inference Extension and various execution strategies. Examine distributed and disaggregated inferencing capabilities with LLM Inference Service for advanced serving scenarios, and gain insights into solving request routing, autoscaling, and scheduling challenges that are significantly more complex than typical model serving use cases.

Syllabus

CNCF On-Demand: Cloud Native Inference at Scale - Unlocking LLM Deployments with KServe

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Cloud Native Inference at Scale - Unlocking LLM Deployments with KServe

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.