Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Fit-to-Serve - How a New DRA Capability for Dynamic Device Sharing Fits Into Distributed LLM Serving

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore how Dynamic Resource Allocation (DRA) capabilities enhance distributed large language model serving through this 24-minute conference talk from CNCF. Learn about llm-d, a community-driven framework that modernizes LLM serving at scale within Kubernetes using a modular architecture that separates prefill and decode operations. Discover how the new DRA capability enables dynamic resource capacity requests and adjustments for compute and network devices, moving beyond traditional GPU units to more granular resource allocation including MIG slices. Understand how DRA's device selection based on fine-grained attributes and topology awareness eliminates the need for workarounds or rigid resource pools. See practical demonstrations of how these DRA enhancements make the llm-d framework more feasible and cost-effective, while examining remaining challenges and gaining insights for implementation in cloud-native environments.

Syllabus

Fit-to-Serve: How a New DRA Capability for Dynamic Device... Sunyanan Choochotkaew & Tatsuhiro Chiba

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Fit-to-Serve - How a New DRA Capability for Dynamic Device Sharing Fits Into Distributed LLM Serving

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.