Simplifying Advanced AI Model Serving on Kubernetes Using Helm Charts
CNCF [Cloud Native Computing Foundation] via YouTube
Learn Excel & Financial Modeling the Way Finance Teams Actually Use Them
MIT Sloan: Lead AI Adoption Across Your Organization — Not Just Pilot It
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to simplify the complex landscape of AI model serving on Kubernetes through an innovative Helm-based approach that abstracts complexity while maintaining flexibility. Discover how to navigate the overwhelming array of technology choices in AI model serving, including inference servers like Ray Serve and Triton Inference Server, inference engines like vLLM, and orchestration platforms like Ray Cluster and KServe. Explore a solution that provides an accelerator-agnostic, consistent YAML interface for deploying and experimenting with various serving technologies without prematurely standardizing on limited technology stacks. Examine two concrete demonstrations of multi-node, multi-accelerator model serving with auto scaling: Ray Serve + vLLM + Ray Cluster, and LeaderWorkerSet + Triton Inference Server + vLLM + Ray Cluster + HPA. Understand how this approach enables teams to leverage the best tools for each specific use case while managing the inherent complexity of modern AI infrastructure deployment on Kubernetes.
Syllabus
Simplifying Advanced AI Model Serving on Kubernetes Using Helm... Ajay Vohra & Tianlu Caron Zhang
Taught by
CNCF [Cloud Native Computing Foundation]