Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Comprehensive GPU Benchmarking and FP8 Format Analysis for LLM Performance

Trelis Research via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn comprehensive GPU benchmarking results and optimization techniques in this 47-minute technical video that covers extensive testing across multiple GPU models. Explore detailed performance metrics including time to first token, token generation throughput, and price per million tokens. Gain deep technical insights into FP8 format benefits, floating point format comparisons, and GPU architecture support. Watch a live demonstration using RTX 3090, understand the differences between SGLang and vLLM frameworks, and master the process of creating and evaluating FP8 models using LLM Compressor for quantization. Access accompanying resources including detailed slides, code repositories, and documentation to implement these optimization techniques in your own projects.

Syllabus

- Introduction to GPU benchmarking test
- Test methodology and parameters
- Results: Time to first token
- Results: Token generation throughput
- Results: Price per million tokens
- Analysis of results ordering and trends
- Detailed discussion topics overview
- Why FP8 format is beneficial
- Technical explanation of FP8
- Comparison of different floating point formats
- GPU architecture support for FP8
- How FP8 works on different GPU architectures
- Live demonstration with RTX 3090
- SGLang vs vLLM comparison
- Caveats and limitations of benchmarking
- How to create FP8 models
- Using LLM Compressor for quantization
- Evaluating quantized models
- Conclusion and resources

Taught by

Trelis Research

Reviews

Start your review of Comprehensive GPU Benchmarking and FP8 Format Analysis for LLM Performance

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.