Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

From Pull To Predict - Accelerating AI Model Deployment on Kubernetes

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to accelerate AI model deployment on Kubernetes through advanced optimization techniques in this 30-minute conference talk. Discover how to tackle deployment latency and resource utilization challenges when working with large AI models in Kubernetes environments. Explore the deployment of a 7B parameter Large Language Model using Ray and vLLM for scaling and serving, while implementing three critical optimizations: SOCI (Seekable OCI) for lazy loading of container images that allows containers to start without downloading entire images first, an optimized storage layer that maintains pre-downloaded models for rapid access, and intelligent node provisioning using Karpenter for dynamic resource allocation. Compare standard deployment approaches against optimized implementations to understand differences in startup times, resource usage, and operational costs. Gain practical implementation steps for these techniques that can be applied to your own Kubernetes environments to significantly improve AI model deployment efficiency and reduce operational overhead.

Syllabus

From Pull To Predict: Accelerating AI Model Deployment on Kubernetes - Lucas Duarte & Tiago Reichert

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of From Pull To Predict - Accelerating AI Model Deployment on Kubernetes

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.