History Driven Sampling for Scalable Graph Neural Networks

Published: 01 Jan 2024, Last Modified: 20 May 2025DASFAA (6) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Graph Neural Networks (GNNs) which achieve great success in numerous scenarios, suffer from unsustainable computational and storage costs due to the exponential increase in the neighbour size as the number of layers escalates. In this view, graph sampling is one of the prevalent methods to improve GNNs’ scalability. Despite its benefits, the variance introduced by sampling severely decelerates the convergence speed and degrades the performance of GNNs. To address this, we leverage Lagrange multipliers to optimize the constrained non-linear variance reduction problem, yielding the theoretical minimum variance sampler. However, putting this sampler into practice faces a key challenge—circular dependency, i.e., sampling probability of the minimum variance sampler is determined by node embeddings while node embeddings can not be calculated until sampling is finished. Therefore, we propose a general framework named History Driven Sampling for scalable GNN (HDSGNN). HDSGNN estimates the minimum variance sampler with the historical nodes’ embeddings to break the circular dependency and then employs this estimated sampler for computation graph sampling and representation learning. We implement HDSGNN in both layer-wise and subgraph sampling respectively. Comprehensive experiments on seven representative benchmarks verify HDSGNN’s effectiveness and efficiency over the SOTA baselines.
Loading