Best Practices for Deploying LLM Inference, RAG and Fine-Tuning Pipelines on Kubernetes

Learn how to effectively deploy, scale, and manage Large Language Model (LLM) inference pipelines on Kubernetes in this technical conference talk from NVIDIA experts. Discover essential best practices for implementing common patterns including inference, retrieval-augmented generation (RAG), and fine-tuning workflows. Master techniques for reducing inference latency through model caching, optimizing GPU resource utilization with efficient scheduling strategies, handling multi-GPU/node configurations, and implementing auto-quantization. Explore methods for enhancing security through Role-Based Access Control (RBAC), setting up comprehensive monitoring, configuring auto-scaling, and supporting air-gapped cluster deployments. Follow demonstrations of building flexible pipelines using both a lightweight standalone operator-pattern tool and KServe, an open-source AI inference platform. Gain practical knowledge for post-deployment management to improve the performance, efficiency, and security of LLM deployments in Kubernetes environments.

Syllabus

Best Practices for Deploying LLM Inference, RAG and Fine... Meenakshi Kaushik & Shiva Krishna Merla

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Best Practices for Deploying LLM Inference, RAG and Fine-Tuning Pipelines on Kubernetes

PowerBI Data Analyst - Create visualizations and dashboards from scratch

Get 20% off all career paths from fullstack to AI

Taught by

2,000+ Free Courses with Certificates: Coding, AI, SQL, and More

Optimizing Load Balancing and Autoscaling for Large Language Model (LLM) Inference on Kubernetes

Cloud Native Inference at Scale - Unlocking LLM Deployments with KServe

Scalable LLM Inference on Kubernetes With NVIDIA NIMS, LangChain, Milvus and FluxCD

LMCache - Lower LLM Performance Costs in the Enterprise

Help! My LLM Is a Resource Hog - How We Tamed Inference With Kubernetes and Open Source Muscle

Build GenAI Apps from Scratch — UCSB PaCE Certificate Program Ad

8 Best Kubernetes Courses for 2026

11 Best DevOps Courses for 2026: From Coding to Reliable Delivery

7 Best AI Video Generation Courses (Free & Paid)

[2026] 150 Courses & Webinars on AI in Healthcare

[2026] 140+ Universities Just Launched 900+ Online Courses. Here’s the Full List.

Never Stop Learning.