Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Voltrix - Sparse Matrix-Matrix Multiplication on Tensor Cores with Asynchronous and Balanced Kernel Optimization

USENIX via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about Voltrix-SpMM, a revolutionary GPU kernel design for sparse matrix-matrix multiplication on Tensor Cores presented at USENIX ATC '25. Discover how researchers from Wuhan University, Nvidia Corporation, and University of Macau address the fundamental challenge of efficiently leveraging Tensor Cores for sparse matrix computations, where the inherently sparse nature of matrices conflicts with dense computational patterns. Explore the innovative asynchronous data loading pipeline that employs bit-wise compressed format for sparse matrices and bulk memory copy instructions for dense matrices, featuring a warp-specialized producer-consumer model that overlaps data loading with computation. Examine the persistent and I/O co-balanced kernel mechanism with its two-stage partition strategy designed to achieve balance between input and output operations. Understand how this CUDA 12.6 implementation delivers substantial performance improvements, achieving average speedups of 36.5x over TC-GNN, 1.8x over DTC-SpMM, and 1.7x over RoDe, effectively unleashing the full computational potential of Tensor Cores for sparse matrix-matrix multiplication in scientific computing and machine learning applications.

Syllabus

USENIX ATC '25 - Voltrix: Sparse Matrix-Matrix Multiplication on Tensor Cores with Asynchronous...

Taught by

USENIX

Reviews

Start your review of Voltrix - Sparse Matrix-Matrix Multiplication on Tensor Cores with Asynchronous and Balanced Kernel Optimization

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.