Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Large Scale Distributed LLM Inference with LLM-D and Kubernetes

Devoxx via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore large-scale distributed inference for Large Language Models using LLM-D and Kubernetes in this comprehensive conference talk. Learn how to overcome the significant challenges of deploying LLMs in production environments, including high GPU/TPU costs, hardware scarcity, and the complex balance between performance, availability, scalability, and cost-efficiency. Discover LLM-D, a Cloud Native Kubernetes-based high-performance distributed LLM inference framework designed to provide the fastest time-to-value and competitive performance per dollar across diverse hardware accelerators. Begin with a gentle introduction to inference on Kubernetes before diving deep into LLM-D's architecture and the specific challenges it addresses. Understand how LLM-D builds upon existing projects like vLLM, Prometheus, and the Kubernetes Gateway API to create an opinionated set of components optimized for GenAI deployments. Examine the framework's KV-cache aware routing and disaggregated serving capabilities that operationalize generative AI at scale. Gain insights from this Apache 2 licensed project created by the makers of vLLM from Red Hat, Google, and Bytedance, and learn how to effectively serve LLMs in critical business applications while maintaining optimal resource utilization.

Syllabus

Large Scale Distributed LLM Inference with LLM D and Kubernetes by Abdel Sghiouar

Taught by

Devoxx

Reviews

Start your review of Large Scale Distributed LLM Inference with LLM-D and Kubernetes

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.