Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

The Challenge of Managing Parallelism and Data-Movement for Tensor Computations on GPUs

Simons Institute via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the complexities of optimizing tensor computations on GPU architectures through this comprehensive lecture that addresses the dual challenges of parallelism management and efficient data movement. Delve into the fundamental issues that arise when executing tensor operations on graphics processing units, examining how to effectively coordinate parallel execution while minimizing data transfer overhead. Learn about the architectural constraints and opportunities presented by GPU hardware, including memory hierarchies, thread organization, and bandwidth limitations that impact tensor computation performance. Understand the trade-offs between computational parallelism and data locality, and discover strategies for optimizing memory access patterns to maximize throughput. Examine real-world scenarios where these challenges manifest in machine learning workloads, scientific computing applications, and high-performance computing environments. Gain insights into current research directions and emerging techniques for addressing these fundamental bottlenecks in GPU-accelerated tensor processing, including compiler optimizations, runtime scheduling strategies, and hardware-software co-design approaches that can significantly improve performance for tensor-intensive applications.

Syllabus

The challenge of managing parallelism and data-movement for tensor computations on GPUs

Taught by

Simons Institute

Reviews

Start your review of The Challenge of Managing Parallelism and Data-Movement for Tensor Computations on GPUs

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.