Completed
- Deepseek V3 performance
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Deepseek V3 Architecture and Performance Optimization - From Training to Deployment
Automatically move to the next video in the Classroom when playback concludes
- 1 - Deepseek V3 performance
- 2 - Performance comparison with Claude Sonnet and GPT-4o
- 3 - Speed tests vs Sonnet and GPT-4o
- 4 - Discussion of model size and deployment requirements for self-hosting
- 5 - Analysis of GPU types and export restrictions
- 6 - Explanation of training efficiency improvements
- 7 - Overview of model architecture evolution over 2022-2024
- 8 - Introduction of Mixture of Experts concept
- 9 - Discussion of load balancing problems
- 10 - Explanation of Deepseek's load balancing solution auxiliary loss free approach
- 11 - Introduction of three additional Deepseek optimisation techniques FP8 training, MLA, Multi-token Prediction.
- 12 - Discussion of 8-bit training
- 13 - Explanation of compressed attention MLA, latent attention
- 14 - Details of multi-token prediction
- 15 - Benefits of speculative decoding
- 16 - Conclusion and summary of Deepseek improvements