MSA: An Efficient Sparsity-Aware Accelerator for Matrix Multiplication with Multi-core Systolic Arrays
Abstract: Conventional multiple systolic arrays architecture excels in dense matrix multiplication. However, it faces challenges such as inefficient computation when dealing with sparse matrices. In this paper, we propose MSA2, an accelerator designed for efficient sparse matrix multiplication based on a multi-core architecture with loosely coupled CPUs and systolic arrays. To address the issue of load imbalance, we introduce a sparsity-aware strategy, intelligently and flexibly allocating workloads to individual cores. Additionally, we propose a sparse matrix representation that is tailored to the systolic array, eliminating the dependency on the skew module, thereby saving area. According to the experimental results, the reduction in load imbalance reaches up to 3.1\(\times \). On average, MSA2 achieves an improvement of 17.5\(\times \) over MACO, 5.5\(\times \) over MACO with preprocessing, 1.4\(\times \) over Mentha, and 2.6\(\times \) over SIGMA, while reducing the runtime of BERT-base/VGG19/ResNet50 by up to 58% compared to Mentha and SIGMA.
External IDs:dblp:conf/ica3pp/TangWSYXS24
Loading