An Efficient GCN Accelerator Based on Workload Reorganization and Feature Reduction

Zhuang Shao; Chenjia Xie; Zihan Ning; Qi Wu; Liang Chang; Yuan Du; Li Du

An Efficient GCN Accelerator Based on Workload Reorganization and Feature Reduction

Zhuang Shao, Chenjia Xie, Zihan Ning, Qi Wu, Liang Chang, Yuan Du, Li Du

Published: 01 Jan 2024, Last Modified: 13 Nov 2024IEEE Trans. Circuits Syst. I Regul. Pap. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The irregular adjacency matrix and the mismatched computation patterns of Aggregation and Combination phases make Graph Neural Networks (GNNs) challenging to compute efficiently. This paper proposes a software and hardware co-design system to reduce computational latency and memory access based on workload reorganization and feature reduction. In software, the adjacency matrix is preprocessed, and the workload in both feature and node dimensions is concentrated to optimize memory access and hardware utilization. The interlayer nodes are analyzed using Principal Component Analysis (PCA) to explore the minimum feature vector length based on information redundancy, and a unique weight initialization is utilized for retraining to trim the feature vector to the minimum length. In hardware, an efficient GCN accelerator is designed to fully support the reorganized workload by reconfigurable output node computation. The hardware accelerator is implemented using 28-nm CMOS technology. It achieves 3.3 TOPS peak throughput and 2.6 TOPS/W energy efficiency. Compared with HyGCN, this result shows that the proposed method can improve the overall performance by $5\times $ with a negligible accuracy loss of less than 0.5%.

Loading