Fit-to-Serve - How a New DRA Capability for Dynamic Device Sharing Fits Into Distributed LLM Serving
CNCF [Cloud Native Computing Foundation] via YouTube
PowerBI Data Analyst - Create visualizations and dashboards from scratch
Build with Azure OpenAI, Copilot Studio & Agentic Frameworks — Microsoft Certified
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore how Dynamic Resource Allocation (DRA) capabilities enhance distributed large language model serving through this 24-minute conference talk from CNCF. Learn about llm-d, a community-driven framework that modernizes LLM serving at scale within Kubernetes using a modular architecture that separates prefill and decode operations. Discover how the new DRA capability enables dynamic resource capacity requests and adjustments for compute and network devices, moving beyond traditional GPU units to more granular resource allocation including MIG slices. Understand how DRA's device selection based on fine-grained attributes and topology awareness eliminates the need for workarounds or rigid resource pools. See practical demonstrations of how these DRA enhancements make the llm-d framework more feasible and cost-effective, while examining remaining challenges and gaining insights for implementation in cloud-native environments.
Syllabus
Fit-to-Serve: How a New DRA Capability for Dynamic Device... Sunyanan Choochotkaew & Tatsuhiro Chiba
Taught by
CNCF [Cloud Native Computing Foundation]