Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to unlock your GPU's full potential through asynchronous programming and CUDA Streams in this 48-minute video tutorial from NVIDIA's Modern CUDA C++ Programming Class. Master the fundamental concepts of synchronous versus asynchronous execution, explore compute-IO overlap techniques, and discover how to use Nsight Systems for performance profiling. Dive deep into CUDA Streams to achieve parallel execution, implement asynchronous memory copying, and leverage pinned memory for optimal performance. Practice with hands-on exercises covering compute-IO overlap, Nsight Systems profiling, NVTX annotations, async copy operations, and copy overlap techniques, with complete solutions provided for each exercise. Access accompanying slides and Google Colab notebooks to run GPU exercises for free, and utilize NVTX (NVIDIA Tools Extension) for advanced performance analysis and debugging of your CUDA applications.
Syllabus
00:00:00 Introduction
00:00:22 Synchronous vs Asynchronous
00:08:32 Exercise Compute-IO Overlap
00:09:16 Solution Compute-IO Overlap
00:10:43 Nsight Systems
00:11:35 Exercise Nsight Systems
00:14:38 Solution Nsight Systems
00:17:01 NVTX
00:19:50 Exercise NVTX
00:20:22 Solution NVTX
00:21:19 Stream
00:35:42 Exercise Async Copy
00:36:20 Solution Async Copy
00:38:36 Pinned Memory
00:42:50 Exercise Copy Overlap
00:43:23 Solution Copy Overlap
00:44:21 Takeways
Taught by
NVIDIA Developer