Explore the intricacies of deploying and orchestrating large open-source inference models on Kubernetes in this 27-minute conference talk from CNCF. Dive into automating the deployment of heavyweight models like Falcon and Llama 2 using Kubernetes Custom Resource Definitions (CRDs) to manage large model files through container images. Learn about streamlining deployment with an HTTP server for inference calls and eliminating manual tuning of deployment parameters with preset configurations. Discover techniques for auto-provisioning GPU nodes based on specific model requirements and empowering users to deploy containerized models effortlessly. Gain insights into dynamic creation of deployment workloads utilizing all GPU nodes through a controller-based approach.

Syllabus

Effortless Scalability: Orchestrating Large Language Model Inference...- Joinal Ahmed & Nirav Kumar

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes

Stuck in Tutorial Hell? Learn Backend Dev the Right Way

Learn AI, Data Science & Business — Earn Certificates That Get You Hired

Taught by

Get 20% off all career paths from fullstack to AI

Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes

Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes

Effortless Scalability - Orchestrating Large Language Model Inference with Kubernetes

Effortless Scalability - Orchestrating Large Language Model Inference with Kubernetes

Optimizing Load Balancing and Autoscaling for Large Language Model (LLM) Inference on Kubernetes

Power BI Fundamentals - Create visualizations and dashboards from scratch Ad

8 Best Kubernetes Courses for 2026

11 Best DevOps Courses for 2026: From Coding to Reliable Delivery

11 Best Docker Courses for 2026

[2026] 80+ GRE Exam Preparation Resources

6 Best Courses on DevSecOps in 2026

Never Stop Learning.