Deepseek V3 Architecture and Performance Optimization - From Training to Deployment

Deepseek V3 Architecture and Performance Optimization - From Training to Deployment

Trelis Research via YouTube Direct link

- Discussion of 8-bit training

12 of 16

12 of 16

- Discussion of 8-bit training

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Deepseek V3 Architecture and Performance Optimization - From Training to Deployment

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - Deepseek V3 performance
  2. 2 - Performance comparison with Claude Sonnet and GPT-4o
  3. 3 - Speed tests vs Sonnet and GPT-4o
  4. 4 - Discussion of model size and deployment requirements for self-hosting
  5. 5 - Analysis of GPU types and export restrictions
  6. 6 - Explanation of training efficiency improvements
  7. 7 - Overview of model architecture evolution over 2022-2024
  8. 8 - Introduction of Mixture of Experts concept
  9. 9 - Discussion of load balancing problems
  10. 10 - Explanation of Deepseek's load balancing solution auxiliary loss free approach
  11. 11 - Introduction of three additional Deepseek optimisation techniques FP8 training, MLA, Multi-token Prediction.
  12. 12 - Discussion of 8-bit training
  13. 13 - Explanation of compressed attention MLA, latent attention
  14. 14 - Details of multi-token prediction
  15. 15 - Benefits of speculative decoding
  16. 16 - Conclusion and summary of Deepseek improvements

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.