Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Simplifying Advanced AI Model Serving on Kubernetes Using Helm Charts

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to simplify the complex landscape of AI model serving on Kubernetes through an innovative Helm-based approach that abstracts complexity while maintaining flexibility. Discover how to navigate the overwhelming array of technology choices in AI model serving, including inference servers like Ray Serve and Triton Inference Server, inference engines like vLLM, and orchestration platforms like Ray Cluster and KServe. Explore a solution that provides an accelerator-agnostic, consistent YAML interface for deploying and experimenting with various serving technologies without prematurely standardizing on limited technology stacks. Examine two concrete demonstrations of multi-node, multi-accelerator model serving with auto scaling: Ray Serve + vLLM + Ray Cluster, and LeaderWorkerSet + Triton Inference Server + vLLM + Ray Cluster + HPA. Understand how this approach enables teams to leverage the best tools for each specific use case while managing the inherent complexity of modern AI infrastructure deployment on Kubernetes.

Syllabus

Simplifying Advanced AI Model Serving on Kubernetes Using Helm... Ajay Vohra & Tianlu Caron Zhang

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Simplifying Advanced AI Model Serving on Kubernetes Using Helm Charts

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.