Production-Ready LLMs on Kubernetes: Patterns, Pitfalls, and Performance
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
This technical presentation explores the challenges and solutions for deploying open source Large Language Models (LLMs) on Kubernetes infrastructure. Learn from experts Priya Samuel and Luke Marsden as they share their practical experience implementing production-grade LLM systems. Through demonstrations, discover the complete deployment lifecycle from GPU configuration to advanced optimization techniques including Flash Attention, quantization tradeoffs, and GPU sharing. Gain valuable insights into architectural patterns using Ollama and vLLM, effective model weight management, context length optimization strategies, and production approaches to fine-tuning with Axolotl and multi-model serving with LoRAX. Walk away with a comprehensive blueprint for building reliable, scalable LLM infrastructure on Kubernetes that addresses common pitfalls while maximizing performance.
Syllabus
Production-Ready LLMs on Kubernetes: Patterns, Pitfalls, and Performa... Priya Samuel & Luke Marsden
Taught by
CNCF [Cloud Native Computing Foundation]