Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Implementation and Optimization of MTP for DeepSeek R1 in TensorRT-LLM

Nvidia via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to implement and optimize MTP (Medusa Tree-based Parallel) speculative decoding for DeepSeek R1 using TensorRT-LLM in this 45-minute technical presentation from Nvidia experts. Discover the fundamentals of the MTP method for large language model inference, explore its specific implementation within the TensorRT-LLM framework, and understand advanced optimization techniques to maximize performance gains. Gain insights into speculative decoding strategies that can significantly improve inference speed and efficiency for production-scale LLM deployments.

Syllabus

Implementation and optimization of MTP for DeepSeek R1 in TensorRT-LLM

Taught by

NVIDIA Developer

Reviews

Start your review of Implementation and Optimization of MTP for DeepSeek R1 in TensorRT-LLM

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.