Simplifying Advanced AI Model Serving on Kubernetes Using Helm Charts
CNCF [Cloud Native Computing Foundation] via YouTube
35% Off Finance Skills That Get You Hired - Code CFI35
Master Windows Internals - Kernel Programming, Debugging & Architecture
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to simplify the complex landscape of AI model serving on Kubernetes through an innovative Helm-based approach that abstracts complexity while maintaining flexibility. Discover how to navigate the overwhelming array of technology choices in AI model serving, including inference servers like Ray Serve and Triton Inference Server, inference engines like vLLM, and orchestration platforms like Ray Cluster and KServe. Explore a solution that provides an accelerator-agnostic, consistent YAML interface for deploying and experimenting with various serving technologies without prematurely standardizing on limited technology stacks. Examine two concrete demonstrations of multi-node, multi-accelerator model serving with auto scaling: Ray Serve + vLLM + Ray Cluster, and LeaderWorkerSet + Triton Inference Server + vLLM + Ray Cluster + HPA. Understand how this approach enables teams to leverage the best tools for each specific use case while managing the inherent complexity of modern AI infrastructure deployment on Kubernetes.
Syllabus
Simplifying Advanced AI Model Serving on Kubernetes Using Helm... Ajay Vohra & Tianlu Caron Zhang
Taught by
CNCF [Cloud Native Computing Foundation]