Staleness-based Subgraph Sampling for Training GNNs on Large-Scale Graphs

Published: 23 Sept 2025, Last Modified: 22 Oct 2025NPGML PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: subgraph sampling, large-scale GNNs training, historical embeddings, staleness
TL;DR: Staleness-based subgraph sampling for historical embedding-based methods for GNN training on large graphs
Abstract: Training Graph Neural Networks (GNNs) on large-scale graphs is challenging. The main difficulty is to obtain accurate node embeddings while avoiding the neighbor explosion problem. One existing solution is using historical embeddings. Specifically, by using historical embeddings for the out-of-batch nodes, these methods can approximate full-batch training without dropping any input data while keeping constant GPU memory consumption. However, it still remains nascent to specifically design a subgraph sampling method that can benefit these historical embedding-based methods. In this paper, we first analyze the approximation error of node embeddings caused by using historical embeddings for out-of-batch neighbors and prove that this approximation error can be minimized by minimizing the staleness of historical embeddings of out-of-batch nodes. Based on the theoretical analysis, we design a simple yet effective Staleness score-based Subgraph Sampling method, called S3, to benefit these historical embedding-based methods. Experimental results show that our S3 sampling method can consistently improve historical embedding-based methods and set the new state-of-the-art without bringing additional computation overhead due to our efficient staleness score calculation, improved re-sampling strategy, and faster training converge.
Submission Number: 113
Loading