Abstract: The computational paradigm of a graph neural network (GNN) can be abstracted as a computation graph (CG). Directly constructing CGs for large-scale graphs is computationally expensive and memory-intensive. Consequently, numerous sampling techniques aim to generate diverse CGs in a minibatch style. However, their CGs are overly large due to neighbor explosion problem, or excessively sparse in structure, causing insufficient representation of node embedding. To achieve a balance between the cost and expressive ability of CGs, we introduce a two-phase framework called NESC to build lightweight and powerful CGs. Initially, one simple yet efficient method is applied for selecting minimal, yet representative subsets of nodes from layer to layer, which restricts the size of CGs. Then, a novel resampling technique is utilized to establish edges between selected node sets, which is equivalent to sampling nodes from a subset of original neighbor sets. These restored edges help to mitigate the sampling bias of collecting nodes, guaranteeing adequate information aggregated for each node. We evaluate NESC against five competitive sampling-based algorithms on six large graphs. Experimental results demonstrate that our approach achieves superior test accuracy, along with 1.43x–5.32x speedups compared to other algorithms.
Loading