2,000+ Free Courses with Certificates: Coding, AI, SQL, and More
Advanced Techniques in Data Visualization - Self Paced Online
Overview
Syllabus
- Introduction to GPU benchmarking test
- Test methodology and parameters
- Results: Time to first token
- Results: Token generation throughput
- Results: Price per million tokens
- Analysis of results ordering and trends
- Detailed discussion topics overview
- Why FP8 format is beneficial
- Technical explanation of FP8
- Comparison of different floating point formats
- GPU architecture support for FP8
- How FP8 works on different GPU architectures
- Live demonstration with RTX 3090
- SGLang vs vLLM comparison
- Caveats and limitations of benchmarking
- How to create FP8 models
- Using LLM Compressor for quantization
- Evaluating quantized models
- Conclusion and resources
Taught by
Trelis Research