Help! My LLM Is a Resource Hog - How We Tamed Inference With Kubernetes and Open Source Muscle
CNCF [Cloud Native Computing Foundation] via YouTube
Google, IBM & Microsoft Certificates — All in One Plan
Free courses from frontend to fullstack and AI
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to optimize large language model (LLM) inference performance and resource management using Kubernetes and open-source CNCF tools in this 26-minute conference talk. Discover practical solutions for addressing common LLM deployment challenges including slow inference speeds, unpredictable GPU usage, and escalating costs through a real-world case study presented by experts from Forrester Research and vCluster. Master the implementation of KServe and Kubeflow for reliable LLM serving, explore benchmarking and auto-scaling techniques using Volcano and KEDA to optimize resource utilization and reduce latency, and understand how to monitor model performance and detect drift using Prometheus, Grafana, and OpenTelemetry. Gain insights from field-tested architectures, performance benchmarks, and lessons learned while building production-ready, efficient, and scalable LLM inference systems using entirely open-source tooling that you can implement immediately.
Syllabus
Help! My LLM Is a Resource Hog: How We Tamed Inference With Kubernetes... Aditya Soni & Hrittik Roy
Taught by
CNCF [Cloud Native Computing Foundation]