Explore the intricacies of deploying and orchestrating large open-source inference models on Kubernetes in this 23-minute conference talk by Joinal Ahmed from Navatech AI. Discover how to automate the deployment of heavyweight models like Falcon and Llama 2 using Kubernetes Custom Resource Definitions (CRDs) for seamless management of large model files through container images. Learn about streamlining deployment with an HTTP server for inference calls, eliminating manual tuning of deployment parameters, and auto-provisioning GPU nodes based on specific model requirements. Gain insights into empowering users to deploy containerized models effortlessly by providing pod templates in the workspace custom resource inference field, enabling dynamic creation of deployment workloads that utilize all GPU nodes.

Syllabus

Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes - Joinal Ahmed

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Effortless Scalability - Orchestrating Large Language Model Inference with Kubernetes

The Private Equity Associate Certification

AI, Data Science & Cloud Certificates from Google, IBM & Meta

Taught by

The Most Addictive Python and SQL Courses

Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes

Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes

Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes

Effortless Scalability - Orchestrating Large Language Model Inference with Kubernetes

Optimizing Load Balancing and Autoscaling for Large Language Model (LLM) Inference on Kubernetes

Become an AI & ML Engineer with Cal Poly EPaCE — IBM-Certified Training Ad

8 Best Kubernetes Courses for 2026

11 Best DevOps Courses for 2026: From Coding to Reliable Delivery

5 Best MongoDB Courses of 2026

[2026] 120+ Courses to Prepare your AWS Certifications

[2026] 150 Courses to Prepare your Microsoft Azure Certification

Never Stop Learning.