Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Production-Ready LLMs on Kubernetes: Patterns, Pitfalls, and Performance

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
This technical presentation explores the challenges and solutions for deploying open source Large Language Models (LLMs) on Kubernetes infrastructure. Learn from experts Priya Samuel and Luke Marsden as they share their practical experience implementing production-grade LLM systems. Through demonstrations, discover the complete deployment lifecycle from GPU configuration to advanced optimization techniques including Flash Attention, quantization tradeoffs, and GPU sharing. Gain valuable insights into architectural patterns using Ollama and vLLM, effective model weight management, context length optimization strategies, and production approaches to fine-tuning with Axolotl and multi-model serving with LoRAX. Walk away with a comprehensive blueprint for building reliable, scalable LLM infrastructure on Kubernetes that addresses common pitfalls while maximizing performance.

Syllabus

Production-Ready LLMs on Kubernetes: Patterns, Pitfalls, and Performa... Priya Samuel & Luke Marsden

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Production-Ready LLMs on Kubernetes: Patterns, Pitfalls, and Performance

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.