Power BI Fundamentals - Create visualizations and dashboards from scratch
Stuck in Tutorial Hell? Learn Backend Dev the Right Way
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn to implement DeepSeek V3 from scratch in this comprehensive 3-hour 47-minute Python course. Follow along as instructor @vukrosic provides both theoretical explanations and hands-on coding instructions for building this cutting-edge deep learning model. Master key concepts including attention mechanisms, Query-Key-Value operations, KV Cache, Multihead Latent Attention (MLA), RoPE (Rotary Position Embedding), Mixture of Experts (MoE), gating mechanisms, and transformer blocks. The course references the DeepSeek V3 paper and provides access to inference code that can be modified for training purposes. Progress through a structured curriculum that begins with fundamental attention concepts and gradually builds toward implementing complete transformer architectures, with practical coding sessions for each component.
Syllabus
⌨️ 0:00:00 Intro
⌨️ 0:01:40 Attention Mechanism
⌨️ 0:13:34 Query, Key, Value
⌨️ 0:34:11 KV Cache
⌨️ 0:39:06 Multihead Latent Attention MLA
⌨️ 0:58:53 Coding MLA
⌨️ 1:28:41 RoPE
⌨️ 1:55:44 Coding KV Cache
⌨️ 2:00:25 MLA forward
⌨️ 2:28:24 MoE, Gate
⌨️ 2:49:25 Gate code
⌨️ 3:09:10 MoE code
⌨️ 3:28:36 Transformer Blocks
Taught by
freeCodeCamp.org