AI Engineer - Learn how to integrate AI into software applications
Free courses from frontend to fullstack and AI
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn advanced performance optimization techniques for DeepSeek R1 model deployment using TensorRT-LLM to achieve minimal latency and push performance boundaries on NVIDIA's Blackwell GPUs. Discover state-of-the-art optimization strategies and implementation methods that enable world-record performance levels. Explore cutting-edge GPU acceleration techniques, memory optimization approaches, and inference optimization patterns specifically designed for large language model deployment. Master the technical methodologies and best practices used by NVIDIA experts to maximize computational efficiency and minimize response times in production environments. Gain insights into the latest hardware-software co-optimization techniques that leverage Blackwell architecture capabilities for superior model performance.
Syllabus
DeepSeek R1 performance optimization to push the latency performance boundary
Taught by
NVIDIA Developer