Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Asynchrony and CUDA Streams - CUDA C++ Class Part 2

Nvidia via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to unlock your GPU's full potential through asynchronous programming and CUDA Streams in this 48-minute video tutorial from NVIDIA's Modern CUDA C++ Programming Class. Master the fundamental concepts of synchronous versus asynchronous execution, explore compute-IO overlap techniques, and discover how to use Nsight Systems for performance profiling. Dive deep into CUDA Streams to achieve parallel execution, implement asynchronous memory copying, and leverage pinned memory for optimal performance. Practice with hands-on exercises covering compute-IO overlap, Nsight Systems profiling, NVTX annotations, async copy operations, and copy overlap techniques, with complete solutions provided for each exercise. Access accompanying slides and Google Colab notebooks to run GPU exercises for free, and utilize NVTX (NVIDIA Tools Extension) for advanced performance analysis and debugging of your CUDA applications.

Syllabus

00:00:00 Introduction
00:00:22 Synchronous vs Asynchronous
00:08:32 Exercise Compute-IO Overlap
00:09:16 Solution Compute-IO Overlap
00:10:43 Nsight Systems
00:11:35 Exercise Nsight Systems
00:14:38 Solution Nsight Systems
00:17:01 NVTX
00:19:50 Exercise NVTX
00:20:22 Solution NVTX
00:21:19 Stream
00:35:42 Exercise Async Copy
00:36:20 Solution Async Copy
00:38:36 Pinned Memory
00:42:50 Exercise Copy Overlap
00:43:23 Solution Copy Overlap
00:44:21 Takeways

Taught by

NVIDIA Developer

Reviews

Start your review of Asynchrony and CUDA Streams - CUDA C++ Class Part 2

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.