$\rm A^2Q$: Aggregation-Aware Quantization for Graph Neural Networks

Zeyu Zhu; Fanrong Li; Zitao Mo; Qinghao Hu; Gang Li; Zejian Liu; Xiaoyao Liang; Jian Cheng

$\rm A^2Q$: Aggregation-Aware Quantization for Graph Neural Networks

Zeyu Zhu, Fanrong Li, Zitao Mo, Qinghao Hu, Gang Li, Zejian Liu, Xiaoyao Liang, Jian Cheng

Published: 01 Feb 2023, Last Modified: 22 Jun 2025ICLR 2023 posterReaders: Everyone

Keywords: Graph Neural Networks, MPNN framework, Mixed-precision, Quantization

TL;DR: We propose an Aggregation-Aware mixed-precision Quantization method that fully utilizes the property of GNNs, achieving up to $2\times$ speedup and $11.4\%$ accuracy improvement compared to the state-of-the-art quantization method on GNNs.

Abstract: As graph data size increases, the vast latency and memory consumption during inference pose a significant challenge to the real-world deployment of Graph Neural Networks (GNNs). While quantization is a powerful approach to reducing GNNs complexity, most previous works on GNNs quantization fail to exploit the unique characteristics of GNNs, suffering from severe accuracy degradation. Through an in-depth analysis of the topology of GNNs, we observe that the topology of the graph leads to significant differences between nodes, and most of the nodes in a graph appear to have a small aggregation value. Motivated by this, in this paper, we propose the Aggregation-Aware mixed-precision Quantization ($\rm A^2Q$) for GNNs, where an appropriate bitwidth is automatically learned and assigned to each node in the graph. To mitigate the vanishing gradient problem caused by sparse connections between nodes, we propose a Local Gradient method to serve the quantization error of the node features as the supervision during training. We also develop a Nearest Neighbor Strategy to deal with the generalization on unseen graphs. Extensive experiments on eight public node-level and graph-level datasets demonstrate the generality and robustness of our proposed method. Compared to the FP32 models, our method can achieve up to $18.8\times$ (i.e., 1.70bits) compression ratio with negligible accuracy degradation. Moreover, compared to the state-of-the-art quantization method, our method can achieve up to $11.4\%$ and $9.5\%$ accuracy improvements on the node-level and graph-level tasks, respectively, and up to $2\times$ speedup on a dedicated hardware accelerator.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/rm-a-2q-aggregation-aware-quantization-for/code)

32 Replies

Loading