Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Inside NVIDIA Dynamo - Faster, Scalable AI Deployment

Anyscale via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore how NVIDIA Dynamo revolutionizes large-scale LLM inference through system-level optimizations in this 34-minute conference talk from Ray Summit 2025. Learn from NVIDIA's Harry Kim as he demonstrates how Dynamo seamlessly integrates with high-performance engines like vLLM, SGLang, and TensorRT-LLM to address the core challenge of delivering massive efficiency gains across distributed serving stacks as LLMs grow in size, context length, and real-world usage. Discover Dynamo's key innovations including smart scheduling that routes requests based on KV-cache hit rates and system load while intelligently autoscaling and disaggregating prefill and decode phases, hierarchical memory management that transparently leverages HBM, CPU memory, local NVMe, and remote storage to minimize latency and maximize model capacity, and low-latency KV-cache transfer capabilities for quick movement across nodes and memory tiers. Examine production-grade LLM serving capabilities featuring tools for identifying optimal disaggregated serving configurations offline, automated tuning based on real-time traffic, topology-aware gang scheduling for dynamic scaling of prefill and decode workers, and LLM-specific fault-tolerance mechanisms for reliable serving at scale. Understand how organizations can achieve higher throughput, lower latency, and better cost efficiency across distributed LLM deployments while maintaining flexibility to use their preferred inference engine, making large-scale inference more efficient, robust, and operationally simple.

Syllabus

Inside NVIDIA Dynamo: Faster, Scalable AI Deployment | Ray Summit 2025

Taught by

Anyscale

Reviews

Start your review of Inside NVIDIA Dynamo - Faster, Scalable AI Deployment

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.