Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Inference in the Age of Reasoning Models

NHR@FAU via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore distributed and disaggregated inference techniques for scalable execution of large language models in this 41-minute conference talk by Dr. Séverine Habert from NVIDIA. Discover how reasoning and agentic AI systems can be optimized through architectural improvements including KV caching, prefix reuse, KV-cache aware routing, and KV-cache offloading. Learn about performance enhancements that reduce latency and support efficient deployment of inference workloads at the cluster level. Gain insights into the latest developments in high-performance computing approaches for modern AI inference challenges, with practical applications for large-scale language model deployment.

Syllabus

HPC Café: Inference in the Age of Reasoning Models

Taught by

NHR@FAU

Reviews

Start your review of Inference in the Age of Reasoning Models

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.