Voltrix - Sparse Matrix-Matrix Multiplication on Tensor Cores with Asynchronous and Balanced Kernel Optimization

Build with Azure OpenAI, Copilot Studio & Agentic Frameworks — Microsoft Certified

Learn More →

The Most Addictive Python and SQL Courses

Learn More →

Overview

Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off

One annual plan covers every course and certificate on Coursera. 40% off for a limited time.

Get Full Access

Learn about Voltrix-SpMM, a revolutionary GPU kernel design for sparse matrix-matrix multiplication on Tensor Cores presented at USENIX ATC '25. Discover how researchers from Wuhan University, Nvidia Corporation, and University of Macau address the fundamental challenge of efficiently leveraging Tensor Cores for sparse matrix computations, where the inherently sparse nature of matrices conflicts with dense computational patterns. Explore the innovative asynchronous data loading pipeline that employs bit-wise compressed format for sparse matrices and bulk memory copy instructions for dense matrices, featuring a warp-specialized producer-consumer model that overlaps data loading with computation. Examine the persistent and I/O co-balanced kernel mechanism with its two-stage partition strategy designed to achieve balance between input and output operations. Understand how this CUDA 12.6 implementation delivers substantial performance improvements, achieving average speedups of 36.5x over TC-GNN, 1.8x over DTC-SpMM, and 1.7x over RoDe, effectively unleashing the full computational potential of Tensor Cores for sparse matrix-matrix multiplication in scientific computing and machine learning applications.

Syllabus

USENIX ATC '25 - Voltrix: Sparse Matrix-Matrix Multiplication on Tensor Cores with Asynchronous...

Taught by

USENIX

Reviews

Start your review of Voltrix - Sparse Matrix-Matrix Multiplication on Tensor Cores with Asynchronous and Balanced Kernel Optimization

Build with Azure OpenAI, Copilot Studio & Agentic Frameworks — Microsoft Certified

The Most Addictive Python and SQL Courses

Taught by

Learn AI, Data Science & Business — Earn Certificates That Get You Hired

High Performance Unstructured SpMM Computation Using Tensor Cores

TC-GNN - Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs

Unstructured Sparsity Meets Tensor Cores - Lessons from Sparse Attention and MoE

GeneralSparse - Bridging the Gap in SpMM for Pruned Large Language Model Inference on GPUs

AI Engineer - Learn how to integrate AI into software applications Ad

A Free Tool to Learn Languages Through Netflix and YouTube: Language Reactor Review

5 Best YouTube Marketing Courses for Business in 2026

[2026] Harvard CS50 Guide: How to Pick the Right Course (with Free Certificate)

[2026] 2000+ Free Developer and IT Certifications

[2026] 140+ Universities Just Launched 900+ Online Courses. Here’s the Full List.

Never Stop Learning.