Google AI Professional Certificate - Learn AI Skills That Get You Hired
Learn Backend Development Part-Time, Online
Overview
Syllabus
- Introduction to GPU benchmarking test
- Test methodology and parameters
- Results: Time to first token
- Results: Token generation throughput
- Results: Price per million tokens
- Analysis of results ordering and trends
- Detailed discussion topics overview
- Why FP8 format is beneficial
- Technical explanation of FP8
- Comparison of different floating point formats
- GPU architecture support for FP8
- How FP8 works on different GPU architectures
- Live demonstration with RTX 3090
- SGLang vs vLLM comparison
- Caveats and limitations of benchmarking
- How to create FP8 models
- Using LLM Compressor for quantization
- Evaluating quantized models
- Conclusion and resources
Taught by
Trelis Research