Finance Certifications Goldman Sachs & Amazon Teams Trust
Power BI Fundamentals - Create visualizations and dashboards from scratch
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to implement and optimize MTP (Medusa Tree-based Parallel) speculative decoding for DeepSeek R1 using TensorRT-LLM in this 45-minute technical presentation from Nvidia experts. Discover the fundamentals of the MTP method for large language model inference, explore its specific implementation within the TensorRT-LLM framework, and understand advanced optimization techniques to maximize performance gains. Gain insights into speculative decoding strategies that can significantly improve inference speed and efficiency for production-scale LLM deployments.
Syllabus
Implementation and optimization of MTP for DeepSeek R1 in TensorRT-LLM
Taught by
NVIDIA Developer