High Performance Unstructured SpMM Computation Using Tensor Cores

This conference talk presents "High Performance Unstructured SpMM Computation Using Tensor Cores," showcasing innovative research on sparse matrix-matrix multiplication optimization. Discover how the SMaT (Sparse Matrix Matrix Tensor Core-accelerated) library enables efficient utilization of Tensor Cores for unstructured sparse matrices, overcoming hardware limitations that typically constrain sparse computations. The presentation explores how the library leverages the low-level CUDA MMA API to maximize GPU performance, with algorithmic optimizations like sparse matrix permutation that minimize non-zero blocks. Evaluation results demonstrate SMaT outperforming state-of-the-art libraries by up to 125x (2.6x on average) on NVIDIA A100 GPUs. The talk covers the introduction to the problem, details of the SMaT approach, performance modeling methodology, comprehensive evaluation results, and concludes with implications for scientific computing, large-model training, and inference applications. The 31-minute presentation from ETH Zurich's Scalable Parallel Computing Lab was delivered at SC '24, the International Conference for High Performance Computing, Networking, Storage, and Analysis.