Fit-to-Serve - How a New DRA Capability for Dynamic Device Sharing Fits Into Distributed LLM Serving

PowerBI Data Analyst - Create visualizations and dashboards from scratch

Learn More →

Build with Azure OpenAI, Copilot Studio & Agentic Frameworks — Microsoft Certified

Learn More →

Overview

Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off

One annual plan covers every course and certificate on Coursera. 40% off for a limited time.

Get Full Access

Explore how Dynamic Resource Allocation (DRA) capabilities enhance distributed large language model serving through this 24-minute conference talk from CNCF. Learn about llm-d, a community-driven framework that modernizes LLM serving at scale within Kubernetes using a modular architecture that separates prefill and decode operations. Discover how the new DRA capability enables dynamic resource capacity requests and adjustments for compute and network devices, moving beyond traditional GPU units to more granular resource allocation including MIG slices. Understand how DRA's device selection based on fine-grained attributes and topology awareness eliminates the need for workarounds or rigid resource pools. See practical demonstrations of how these DRA enhancements make the llm-d framework more feasible and cost-effective, while examining remaining challenges and gaining insights for implementation in cloud-native environments.