2,000+ Free Courses with Certificates: Coding, AI, SQL, and More
AI, Data Science & Cloud Certificates from Google, IBM & Meta
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn how to implement and optimize MTP (Medusa Tree-based Parallel) speculative decoding for DeepSeek R1 using TensorRT-LLM in this 45-minute technical presentation from Nvidia experts. Discover the fundamentals of the MTP method for large language model inference, explore its specific implementation within the TensorRT-LLM framework, and understand advanced optimization techniques to maximize performance gains. Gain insights into speculative decoding strategies that can significantly improve inference speed and efficiency for production-scale LLM deployments.
Syllabus
Implementation and optimization of MTP for DeepSeek R1 in TensorRT-LLM
Taught by
NVIDIA Developer