A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking

Keyu Duan; Zirui Liu; Peihao Wang; Wenqing Zheng; Kaixiong Zhou; Tianlong Chen; Xia Hu; Zhangyang Wang

A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking

Keyu Duan, Zirui Liu, Peihao Wang, Wenqing Zheng, Kaixiong Zhou, Tianlong Chen, Xia Hu, Zhangyang Wang

Published: 17 Sept 2022, Last Modified: 20 Apr 2025NeurIPS 2022 Datasets and Benchmarks Readers: Everyone

Keywords: Graph Convolutional Networks, Scalability, Benchmark

TL;DR: We present a comprehensive and fair benchmark study on large-scale graph training and further propose a new layer-wise training manner the achieves new SOTA performance on large-scale graph datasets.

Abstract: Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs). Due to the nature of evolving graph structures into the training process, vanilla GNNs usually fail to scale up, limited by the GPU memory space. Up to now, though numerous scalable GNN architectures have been proposed, we still lack a comprehensive survey and fair benchmark of this reservoir to find the rationale for designing scalable GNNs. To this end, we first systematically formulate the representative methods of large-scale graph training into several branches and further establish a fair and consistent benchmark for them by a greedy hyperparameter searching. In addition, regarding efficiency, we theoretically evaluate the time and space complexity of various branches and empirically compare them w.r.t GPU memory usage, throughput, and convergence. Furthermore, We analyze the pros and cons for various branches of scalable GNNs and then present a new ensembling training manner, named EnGCN, to address the existing issues. Remarkably, our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets. Our code is available at https://github.com/VITA-Group/Large_Scale_GCN_Benchmarking.

URL: https://github.com/VITA-Group/Large_Scale_GCN_Benchmarking

Dataset Url: All datasets can be accessed through the PyTorch Geometric (https://github.com/pyg-team/pytorch_geometric), a GCN library for PyTorch.

License: MIT LICENSE

Author Statement: Yes

Supplementary Material: pdf

Contribution Process Agreement: Yes

In Person Attendance: Yes

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/a-comprehensive-study-on-large-scale-graph/code)

15 Replies

Loading