In this course, you will dive into the core mechanics of heterogeneous computing using C++ and CUDA. You will learn how to allocate GPU memory safely, transfer datasets correctly between the CPU and GPU, and configure block dimensions to execute and verify your first one-dimensional kernels.
Overview
Syllabus
- Unit 1: Device Memory Allocation
- Safe CUDA Calls
- Build the Memory Helper
- Hunt the Memory Bug
- Safe GPU Cleanup
- Closing the Memory Loop
- Triple Array Challenge
- Unit 2: Host to Device Transfer
- Moving Data to Device
- Bringing Results Home
- Debugging CUDA Data Transfer
- Complete CUDA Transfer Flow
- Unit 3: 1D Vector Addition Kernel
- Finding Thread Positions
- Guarding Vector Addition
- Cover Every Vector Element
- A New Vector Operation
- Making GPU Results Reliable
- Scaling a Vector on CUDA
- Unit 4: Robust Grid Stride Loops
- Safe Steps in CUDA
- Stride Across the Grid
- Launching the Full Grid
- From Threads to Strides
- Multiply Across the Grid
- Grid Stride Detective