Help! My LLM Is a Resource Hog - How We Tamed Inference With Kubernetes and Open Source Muscle
CNCF [Cloud Native Computing Foundation] via YouTube
AI Engineer - Learn how to integrate AI into software applications
Future-Proof Your Career: AI Manager Masterclass
Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Learn how to optimize large language model (LLM) inference performance and resource management using Kubernetes and open-source CNCF tools in this 26-minute conference talk. Discover practical solutions for addressing common LLM deployment challenges including slow inference speeds, unpredictable GPU usage, and escalating costs through a real-world case study presented by experts from Forrester Research and vCluster. Master the implementation of KServe and Kubeflow for reliable LLM serving, explore benchmarking and auto-scaling techniques using Volcano and KEDA to optimize resource utilization and reduce latency, and understand how to monitor model performance and detect drift using Prometheus, Grafana, and OpenTelemetry. Gain insights from field-tested architectures, performance benchmarks, and lessons learned while building production-ready, efficient, and scalable LLM inference systems using entirely open-source tooling that you can implement immediately.
Syllabus
Help! My LLM Is a Resource Hog: How We Tamed Inference With Kubernetes... Aditya Soni & Hrittik Roy
Taught by
CNCF [Cloud Native Computing Foundation]