Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Implementing New Algorithm with CUDA Kernels - CUDA C++ Class Part 3

Nvidia via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to implement new algorithms on the GPU using CUDA kernels in this comprehensive video tutorial from NVIDIA's Modern CUDA C++ Programming Class. Master the fundamentals of CUDA kernel development, starting with basic kernel concepts and progressing through advanced optimization techniques including atomic operations, shared memory utilization, and cooperative algorithms. Work through hands-on exercises covering symmetry operations, histogram implementations, and debugging tools while exploring privatized histograms, thread scope management, and streaming multiprocessor (SM) architecture. Discover how to optimize performance using shared memory, leverage the CUB library for efficient parallel primitives, and implement cooperative histogram algorithms. Practice debugging CUDA code and apply atomic operations to solve race conditions in parallel computing scenarios. Access accompanying slides and Google Colab exercises to run GPU code for free, with step-by-step solutions provided for each programming challenge. Designed for C++ developers seeking to write clean, efficient, and idiomatic GPU code using modern CUDA best practices, whether you're new to CUDA programming or looking to modernize existing GPU applications.

Syllabus

00:00:00 Introduction
00:00:22 CUDA Kernels
00:17:30 Exercise Symmetry
00:18:32 Solution Symmetry
00:19:20 Exercise Row Symmetry
00:19:38 Solution Row Symmetry
00:21:38 Debugging Tools and Atomic Operations
00:36:38 Exercise Fix Histogram
00:36:59 Solution Fix Histogram
00:38:18 Privatized Histogram and Thread Scope
00:47:36 Exercise Fix Histogram 2
00:48:08 Solution Fix Histogram 2
00:49:35 SM and Shared Memory
00:55:45 Exercise Optimize Histogram
00:56:05 Solution Optimize Histogram
00:57:58 CUB
01:05:05 Exercise Cooperative Histogram
01:05:24 Solution Cooperative Histogram
01:06:06 Takeways
01:08:17 Final Review
01:10:43 Final Assessment

Taught by

NVIDIA Developer

Reviews

Start your review of Implementing New Algorithm with CUDA Kernels - CUDA C++ Class Part 3

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.