Comprehensive GPU Benchmarking and FP8 Format Analysis for LLM Performance

Comprehensive GPU Benchmarking and FP8 Format Analysis for LLM Performance

Trelis Research via YouTube Direct link

- Results: Time to first token

3 of 19

3 of 19

- Results: Time to first token

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Comprehensive GPU Benchmarking and FP8 Format Analysis for LLM Performance

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - Introduction to GPU benchmarking test
  2. 2 - Test methodology and parameters
  3. 3 - Results: Time to first token
  4. 4 - Results: Token generation throughput
  5. 5 - Results: Price per million tokens
  6. 6 - Analysis of results ordering and trends
  7. 7 - Detailed discussion topics overview
  8. 8 - Why FP8 format is beneficial
  9. 9 - Technical explanation of FP8
  10. 10 - Comparison of different floating point formats
  11. 11 - GPU architecture support for FP8
  12. 12 - How FP8 works on different GPU architectures
  13. 13 - Live demonstration with RTX 3090
  14. 14 - SGLang vs vLLM comparison
  15. 15 - Caveats and limitations of benchmarking
  16. 16 - How to create FP8 models
  17. 17 - Using LLM Compressor for quantization
  18. 18 - Evaluating quantized models
  19. 19 - Conclusion and resources

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.