AI, Data Science & Cloud Certificates from Google, IBM & Meta
Get 20% off all career paths from fullstack to AI
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Discover NVIDIA Dynamo, a new distributed inference serving framework specifically designed for deploying reasoning large language models (LLMs) across multi-node environments in this advanced technical session. Explore the framework's architecture and key components that enable seamless scaling within data centers while driving advanced inference optimization. Learn about cutting-edge inference serving techniques, including disaggregated serving that separates prefill and decode operations to optimize request handling and increase inference throughput. The session also covers how to quickly deploy this innovative serving framework using NVIDIA NIM. Presented by NVIDIA experts Harry Kim, Neelay Shah, Ryan Olson, and Tanmay Verma, this 89-minute technical presentation is a replay of NVIDIA GTC Session ID S73042 and features NVIDIA technologies including TensorRT, DALI, NVLink/NVSwitch, and Triton.
Syllabus
Introducing NVIDIA Dynamo: A Distributed Inference Serving Framework for Reasoning models
Taught by
NVIDIA Developer