Overview
Syllabus
- Deepseek V3 performance
- Performance comparison with Claude Sonnet and GPT-4o
- Speed tests vs Sonnet and GPT-4o
- Discussion of model size and deployment requirements for self-hosting
- Analysis of GPU types and export restrictions
- Explanation of training efficiency improvements
- Overview of model architecture evolution over 2022-2024
- Introduction of Mixture of Experts concept
- Discussion of load balancing problems
- Explanation of Deepseek's load balancing solution auxiliary loss free approach
- Introduction of three additional Deepseek optimisation techniques FP8 training, MLA, Multi-token Prediction.
- Discussion of 8-bit training
- Explanation of compressed attention MLA, latent attention
- Details of multi-token prediction
- Benefits of speculative decoding
- Conclusion and summary of Deepseek improvements
Taught by
Trelis Research