Master Windows Internals - Kernel Programming, Debugging & Architecture
Coursera Plus Annual Nearly 45% Off
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to implement and optimize MTP (Medusa Tree-based Parallel) speculative decoding for DeepSeek R1 using TensorRT-LLM in this 45-minute technical presentation from Nvidia experts. Discover the fundamentals of the MTP method for large language model inference, explore its specific implementation within the TensorRT-LLM framework, and understand advanced optimization techniques to maximize performance gains. Gain insights into speculative decoding strategies that can significantly improve inference speed and efficiency for production-scale LLM deployments.
Syllabus
Implementation and optimization of MTP for DeepSeek R1 in TensorRT-LLM
Taught by
NVIDIA Developer