Production-Ready LLMs on Kubernetes: Patterns, Pitfalls, and Performance
CNCF [Cloud Native Computing Foundation] via YouTube
Free courses from frontend to fullstack and AI
AI, Data Science & Cloud Certificates from Google, IBM & Meta
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
This technical presentation explores the challenges and solutions for deploying open source Large Language Models (LLMs) on Kubernetes infrastructure. Learn from experts Priya Samuel and Luke Marsden as they share their practical experience implementing production-grade LLM systems. Through demonstrations, discover the complete deployment lifecycle from GPU configuration to advanced optimization techniques including Flash Attention, quantization tradeoffs, and GPU sharing. Gain valuable insights into architectural patterns using Ollama and vLLM, effective model weight management, context length optimization strategies, and production approaches to fine-tuning with Axolotl and multi-model serving with LoRAX. Walk away with a comprehensive blueprint for building reliable, scalable LLM infrastructure on Kubernetes that addresses common pitfalls while maximizing performance.
Syllabus
Production-Ready LLMs on Kubernetes: Patterns, Pitfalls, and Performa... Priya Samuel & Luke Marsden
Taught by
CNCF [Cloud Native Computing Foundation]