Completed
- Detailed discussion topics overview
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Comprehensive GPU Benchmarking and FP8 Format Analysis for LLM Performance
Automatically move to the next video in the Classroom when playback concludes
- 1 - Introduction to GPU benchmarking test
- 2 - Test methodology and parameters
- 3 - Results: Time to first token
- 4 - Results: Token generation throughput
- 5 - Results: Price per million tokens
- 6 - Analysis of results ordering and trends
- 7 - Detailed discussion topics overview
- 8 - Why FP8 format is beneficial
- 9 - Technical explanation of FP8
- 10 - Comparison of different floating point formats
- 11 - GPU architecture support for FP8
- 12 - How FP8 works on different GPU architectures
- 13 - Live demonstration with RTX 3090
- 14 - SGLang vs vLLM comparison
- 15 - Caveats and limitations of benchmarking
- 16 - How to create FP8 models
- 17 - Using LLM Compressor for quantization
- 18 - Evaluating quantized models
- 19 - Conclusion and resources