BEAST-GNN: A United Bit Sparsity-Aware Accelerator for Graph Neural Networks

Published: 2025, Last Modified: 07 Jan 2026IEEE Trans. Computers 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Graph Neural Networks (GNNs) excel in processing graph-structured data, making them attractive and promising for tasks such as recommender systems and traffic forecasting. However, GNNs’ irregular computational patterns limit their ability to achieve low latency and high energy efficiency, particularly in edge computing environments. Current GNN accelerators predominantly focus on value sparsity, underutilizing the potential performance gains from bit-level sparsity. However, applying existing bit-serial accelerators to GNNs presents several challenges. These challenges arise from GNNs’ more complex data flow compared to conventional neural networks, as well as difficulties in data localization and load balancing with irregular graph data. To address these challenges, we propose BEAST-GNN, a bit-serial GNN accelerator that fully exploits bit-level sparsity. BEAST-GNN introduces streamlined sparse-dense bit matrix multiplication for optimized data flow, a column-overlapped graph partitioning method to enhance data locality by reducing memory access inefficiencies, and a sparse bit-counting strategy to ensure balanced workload distribution across processing elements (PEs). Compared to state-of-the-art accelerators, including HyGCN, GCNAX, Laconic, GROW, I-GCN, SGCN, and MEGA, BEAST-GNN achieves speedups of 21.7$\boldsymbol{\times}$, 6.4$\boldsymbol{\times}$, 10.5$\boldsymbol{\times}$, 3.7$\boldsymbol{\times}$, 4.0$\boldsymbol{\times}$, 3.3$\boldsymbol{\times}$, and 1.4$\boldsymbol{\times}$ respectively, while also reducing DRAM access by 36.3$\boldsymbol{\times}$, 7.9$\boldsymbol{\times}$, 6.6$\boldsymbol{\times}$, 3.9$\boldsymbol{\times}$, 5.38$\boldsymbol{\times}$, 3.37$\boldsymbol{\times}$, and 1.44$\boldsymbol{\times}$. Additionally, BEAST-GNN consumes only 4.8%, 12.4%, 19.6%, 27.7%, 17.0%, 26.5%, and 82.8% of the energy required by these architectures.
Loading