HierBatching: Locality-Aware Out-of-Core Training of Graph Neural Networks

Tianhao Huang; Xuhao Chen; Muhua Xu; Arvind Arvind; Jie Chen

HierBatching: Locality-Aware Out-of-Core Training of Graph Neural Networks

Tianhao Huang, Xuhao Chen, Muhua Xu, Arvind Arvind, Jie Chen

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: Graph Neural Network, Out-of-Core Training, Spatial Locality, Temporal Locality, Hierarchical Batching

TL;DR: A locality-aware out-of-core training approach for Graph Neural Networks that is an order of magnitude faster without compromising accuracy

Abstract: As graph neural networks (GNNs) become increasingly more popular for analyzing data organized as massive graphs, how these models can be efficiently trained under economic computing resources becomes a critical subject that influences the widespread adoption of GNNs in practice. We consider the use of a single commodity machine restrained by limited memory but otherwise is attached with ample external storage. In such an under-explored scenario, not only the feature data often exceeds the memory capacity, but also the graph structure may not fit in memory as well. Then, with data stored on disk, gathering features and constructing neighborhood subgraphs in a usual mini-batch training incur inefficient random access and expensive data movement. To overcome this bottleneck, we propose a locality-aware training scheme, coined HierBatching, to significantly increase sequential disk access, while maintaining the random nature of stochastic training and its quality. HierBatching exploits the memory hierarchy of a modern GPU machine and constructs batches in an analogously hierarchical manner. Therein, graph nodes are organized in many partitions, each of which is laid out contiguously in disk for maximal spatial locality; while the main memory stores random partitions and is treated as the cache of the disk. Its content is reused multiple times for improving temporal locality. We conduct comprehensive experiments, including locality ablation, to demonstrate that HierBatching is economic, fast, and accurate.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

13 Replies

Loading