RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration

Published: 2023, Last Modified: 14 Feb 2025MICRO 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper proposes RM-STC, a novel GPU tensor core architecture designed for sparse Deep Neural Networks (DNNs) with two key innovations: (1) native support for both training and inference and (2) high efficiency for all sparsity degrees. To achieve the first goal, RM-STC employs a uniform sparse encoding scheme that natively supports all operations holistically in forward and backward passes, thereby eliminating the need for costly sparse encoding transformation in between. For the second goal, RM-STC takes inspiration from the row-merge dataflow and combines the input-gathering and output-scattering hardware features to minimize the energy overhead. Experiments show that RM-STC achieves significant speedups and energy efficiency improvements over dense tensor cores and previous sparse tensor cores.
Loading