Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CodeSignal

Shared Memory Optimization

via CodeSignal

Overview

In this course, you will tackle global memory latency by harnessing the power of fast, on-chip shared memory. You will learn to synchronize threads using shared memory, implement a boundary-safe tiled matrix multiplication algorithm, and empirically compare it against a naive implementation using validated benchmarks.

Syllabus

  • Unit 1: Shared Memory Declaration
    • Shared Memory Mystery
    • Shared Memory Teamwork
    • Shared Tile Debugging
    • Building Block Cooperation
    • Indexing Under Pressure
  • Unit 2: Thread Synchronization
    • The Transpose Mystery
    • Completing the Tile Transpose
    • Barrier Trouble in Transpose
    • Shared Memory Row Flip
    • Mirror Tile Challenge
  • Unit 3: Implementing Tiled Computation
    • Shared Memory Race Rescue
    • Guarding Edge Tiles
    • Building Tile Loads
    • Covering the Whole Matrix
    • Finishing Tiled Matrix Multiply
  • Unit 4: Comparative Performance Analysis
    • Trusting Benchmark Results
    • Timing the Benchmark Right
    • Guarding Matrix Edges
    • Warming Up GPU Benchmarks
    • Benchmarking with Lambdas
    • Cleaning Up Benchmark Timing
    • Benchmarking Across Matrix Sizes

Reviews

Start your review of Shared Memory Optimization

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.