Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Introduction of Inference Time Compute Support in TensorRT-LLM

Nvidia via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about scaffolding, a new framework in TensorRT-LLM designed to support various inference-time compute methods including majority vote, best of N, and Monte Carlo Tree Search (MCTS), in this 56-minute technical presentation from Nvidia experts. Discover how to implement custom inference-time compute methods using the scaffolding framework and understand how it achieves an optimal balance between usability, modularity, and high performance. Explore community contributions and gain insights into extending TensorRT-LLM's capabilities for advanced inference-time computation scenarios.

Syllabus

Introduction of inference time compute support in TensorRT-LLM

Taught by

NVIDIA Developer

Reviews

Start your review of Introduction of Inference Time Compute Support in TensorRT-LLM

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.