Overview
Learn CUDA from the ground up by managing GPU memory, launching 1D and 2D kernels, optimizing with shared memory, and building practical image-processing pipelines for graphics-oriented parallel computing.
Syllabus
- Course 1: CUDA Basics and 1D Operations
- Course 2: 2D Grids and Matrix Math
- Course 3: Shared Memory Optimization
- Course 4: Image Processing with CUDA
Courses
-
In this course, you will dive into the core mechanics of heterogeneous computing using C++ and CUDA. You will learn how to allocate GPU memory safely, transfer datasets correctly between the CPU and GPU, and configure block dimensions to execute and verify your first one-dimensional kernels.
-
Master 2D thread indexing in CUDA by mapping multidimensional grids to linear memory. You will implement row-major indexing, build a naive matrix multiplication kernel, and apply 2D grid-stride loops. These skills allow you to process large, rectangular datasets efficiently while ensuring your kernels remain scalable and flexible across various GPU architectures.
-
In this course, you will tackle global memory latency by harnessing the power of fast, on-chip shared memory. You will learn to synchronize threads using shared memory, implement a boundary-safe tiled matrix multiplication algorithm, and empirically compare it against a naive implementation using validated benchmarks.
-
Apply your CUDA skills to image buffers by processing pixels on the GPU. You will map RGB data to threads, handle boundary conditions, and build a pipeline chaining grayscale conversion and Sobel edge detection. Finally, you will validate and export results, mastering real-world GPU image manipulation.