From Pull To Predict - Accelerating AI Model Deployment on Kubernetes
CNCF [Cloud Native Computing Foundation] via YouTube
Gain a Splash of New Skills - Coursera+ Annual Just ₹7,999
Get 35% Off CFI Certifications - Code CFI35
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to accelerate AI model deployment on Kubernetes through advanced optimization techniques in this 30-minute conference talk. Discover how to tackle deployment latency and resource utilization challenges when working with large AI models in Kubernetes environments. Explore the deployment of a 7B parameter Large Language Model using Ray and vLLM for scaling and serving, while implementing three critical optimizations: SOCI (Seekable OCI) for lazy loading of container images that allows containers to start without downloading entire images first, an optimized storage layer that maintains pre-downloaded models for rapid access, and intelligent node provisioning using Karpenter for dynamic resource allocation. Compare standard deployment approaches against optimized implementations to understand differences in startup times, resource usage, and operational costs. Gain practical implementation steps for these techniques that can be applied to your own Kubernetes environments to significantly improve AI model deployment efficiency and reduce operational overhead.
Syllabus
From Pull To Predict: Accelerating AI Model Deployment on Kubernetes - Lucas Duarte & Tiago Reichert
Taught by
CNCF [Cloud Native Computing Foundation]