Explore the intricacies of deploying and orchestrating large open-source inference models on Kubernetes in this 23-minute conference talk from CNCF. Discover how to automate the deployment of heavyweight models like Falcon and Llama 2 using Kubernetes Custom Resource Definitions (CRDs) for seamless management of large model files through container images. Learn about streamlining deployment with an HTTP server for inference calls, eliminating manual tuning of deployment parameters, and auto-provisioning GPU nodes based on specific model requirements. Gain insights into empowering users to deploy containerized models effortlessly by providing pod templates in the workspace custom resource inference field. Understand how the controller dynamically creates deployment workloads utilizing all GPU nodes, ensuring optimal resource utilization in the AI/ML landscape.

Syllabus

Effortless Scalability: Orchestrating Large Language Model Inference... Rohit Ghumare & Joinal Ahmed

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes

Build the Finance Skills That Lead to Promotions — Not Just Certificates

UC San Diego Product Management Certificate — AI-Powered PM Training

Taught by

Get 20% off all career paths from fullstack to AI

Effortless Scalability - Orchestrating Large Language Model Inference with Kubernetes

Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes

Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes

Effortless Scalability - Orchestrating Large Language Model Inference with Kubernetes

Optimizing Load Balancing and Autoscaling for Large Language Model (LLM) Inference on Kubernetes

Google Data Analytics, IBM AI & Meta Marketing — All in One Subscription Ad

From Zero to GenAI: 9 Unique Ways to Understand Large Language Models

8 Best Kubernetes Courses for 2026

Write Prompts That Actually Work: ZTM’s Prompt Engineering Bootcamp Review

11 Best Docker Courses for 2026

25 Resources to Learn Generative Engine Optimization in 2026

Never Stop Learning.