Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Coursera

GPU Programming with C++ and CUDA

Packt via Coursera

Overview

Why Pay Per Course When You Can Get All of Coursera for 40% Off?
10,000+ courses, Google, IBM & Meta certificates, one annual plan at 40% off. Upgrade now.
Get Full Access
In this course, you’ll master GPU programming using C++ and CUDA to significantly enhance your software's performance. By focusing on parallelism, you’ll learn to leverage the full power of GPUs for high-performance computing applications. You will acquire practical knowledge on managing GPU devices, optimizing GPU resource usage, and integrating GPU code with Python to build scalable and efficient applications. This course emphasizes real-world strategies for optimizing performance and building reusable libraries. This course combines fundamental theory with hands-on applications to help you solve complex performance challenges. You'll not only understand the core concepts but also implement them in real-world projects, such as creating libraries for Python integration. Ideal for C++ developers with experience in basic programming concepts, this course will take you through advanced topics, from parallel algorithms to multi-GPU usage. A background in operating systems is recommended for tackling more complex concepts. Based on the book, GPU Programming with C++ and CUDA, by Paulo Motta.

Syllabus

  • Introduction to Parallel Programming
    • In this section, we explore parallelism in software, its importance, and the differences between CPU and GPU architectures to build a foundation for GPU programming.
  • Setting Up Your Development Environment
    • In this section, we configure a GPU environment using Docker, locate official Linux documentation, and install the CUDA toolkit on Ubuntu 20.04 or 22.04 for AI and machine learning workflows.
  • Hello CUDA
    • In this section, we introduce GPU programming fundamentals, including kernel execution, device inspection, and setting up a working environment for CUDA development.
  • Hello Again, but in Parallel
    • In this section, we explore SIMD execution, data movement, and parallel vector addition for GPU programming.
  • A Closer Look into the World of GPUs
    • In this section, we explore GPU thread, block, and grid configurations, asynchronous data transfer, streams, events, and shared memory to optimize performance in parallel computing.
  • Parallel Algorithms with CUDA
    • In this section, we explore parallel algorithm design, focusing on matrix operations, reduction, and workload balancing for efficient GPU execution.
  • Performance Strategies
    • In this section, we explore GPU optimization and profile with NVIDIA Nsight Compute.
  • Overlaying Multiple Operations
    • In this section, we explore debugging CUDA code with VS Code, using CUDA streams to overlap memory and kernel operations, and configuring multiple GPUs for parallel processing.
  • Exposing Your Code to Python
    • In this section, we explore methods to integrate C++ GPU code with Python, focusing on Ctypes, custom wrappers, and performance analysis for efficient cross-language execution.
  • Exploring Existing GPU Models
    • In this section, we explore GPU development using cuBLAS and Thrust, optimize code for memory and thread efficiency, and test with GTest and Pytest to ensure reliability and performance.

Taught by

Packt - Course Instructors

Reviews

Start your review of GPU Programming with C++ and CUDA

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.