Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about disaggregated serving in TensorRT-LLM through this 37-minute technical presentation from Nvidia experts. Discover the potential benefits of disaggregated serving architecture and gain practical knowledge on implementing disaggregated serving with TensorRT-LLM. Explore current performance metrics and benchmarks for popular large language models when using this serving approach, understanding how this technique can optimize resource utilization and scalability in production environments.
Syllabus
Introduction of disaggregated serving in TensorRT-LLM
Taught by
NVIDIA Developer