Communication Optimization for Distributed GCN Training on ABCI Supercomputer

Chen Zhuang; Peng Chen; Xin Liu; Toshio Endo; Satoshi Matsuoka; Mohamed Wahib

Communication Optimization for Distributed GCN Training on ABCI Supercomputer

Chen Zhuang, Peng Chen, Xin Liu, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

Published: 01 Jan 2024, Last Modified: 14 Feb 2025CLUSTER Workshops 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Graph Convolutional Networks (GCNs) are widely used in various domains. However, training distributed full-batch GCNs on large-scale graphs poses challenges due to high communication overhead. This work presents a hybrid pre-post-aggregation approach and an integer quantization method to reduce communication costs. With these techniques, we develop a scalable distributed GCN training framework, SuperGNN, for supercomputers ABCI. Experimental results on multiple large graph datasets show that our method achieves a speedup of up to 6× compared with the state-of-the-art implementations, without sacrificing model accuracy.

Loading