Learn Generative AI, Prompt Engineering, and LLMs for Free
Launch a New Career with Certificates from Google, IBM & Microsoft
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn to implement DeepSeek V3 from scratch in this comprehensive 3-hour 47-minute Python course. Follow along as instructor @vukrosic provides both theoretical explanations and hands-on coding instructions for building this cutting-edge deep learning model. Master key concepts including attention mechanisms, Query-Key-Value operations, KV Cache, Multihead Latent Attention (MLA), RoPE (Rotary Position Embedding), Mixture of Experts (MoE), gating mechanisms, and transformer blocks. The course references the DeepSeek V3 paper and provides access to inference code that can be modified for training purposes. Progress through a structured curriculum that begins with fundamental attention concepts and gradually builds toward implementing complete transformer architectures, with practical coding sessions for each component.
Syllabus
⌨️ 0:00:00 Intro
⌨️ 0:01:40 Attention Mechanism
⌨️ 0:13:34 Query, Key, Value
⌨️ 0:34:11 KV Cache
⌨️ 0:39:06 Multihead Latent Attention MLA
⌨️ 0:58:53 Coding MLA
⌨️ 1:28:41 RoPE
⌨️ 1:55:44 Coding KV Cache
⌨️ 2:00:25 MLA forward
⌨️ 2:28:24 MoE, Gate
⌨️ 2:49:25 Gate code
⌨️ 3:09:10 MoE code
⌨️ 3:28:36 Transformer Blocks
Taught by
freeCodeCamp.org