TopGQ: Fast GNN Post-Training Quantization Leveraging Topology Information

18 Sept 2025 (modified: 24 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Graph Neural Networks, Quantization
TL;DR: A quick GNN post-training quantization that is faster than previous works, and well preserves task performance of the quantized model.
Abstract: Graph Neural Networks (GNNs) demand substantial memory and computation as datasets scale in size. Thus, quantization is a promising remedy by compressing full-precision values into low-bit representations. However, existing GNN quantization methods depend on tedious gradient-based updates to preserve accuracy. This quantization time may be a major barrier to real-world deployments as the input graph size scales. To this end, we present TopGQ (Topology-aware GNN Quantization), an accurate post-training quantization framework tailored for GNNs, alleviating the burden of redundant quantization overhead. We propose Dual-axis scale absorption, which applies scale factors along both activation axes, merging one into the static adjacency matrix. Dual-axis scale absorption attains higher accuracy via addressing outlier nodes. This helps maintain the same computational cost as column-wise quantized inference. We further introduce topology-guided quantization, which exploits the relationship between local graph structure and activation variance. TopGQ enables fast inference for unseen nodes, via a novel node index (TopPIN), a lightweight proxy of activation variance from local structure. With these techniques, TopGQ eliminates the need for retraining while preserving accuracy. Experimental results show that TopGQ is comparable to prior works while reducing quantization time by an order of magnitude, establishing it as a practical solution for efficient and scalable GNN inference.
Primary Area: learning on graphs and other geometries & topologies
Submission Number: 11962
Loading