- Keywords: Layerwise Sampling, Graph Neural Networks, Attention Mechanism
- Abstract: The Graph Convolutional Network (GCN) and its variants are powerful models for graph representation learning and have recently achieved great success on many graph-based applications. However, most of them target on shallow models (e.g. 2 layers) on relatively small graphs. Very recently, although many acceleration methods have been developed for GCNs training, it still remains a severe challenge how to scale GCN-like models to larger graphs and deeper layers due to the over-expansion of neighborhoods across layers. In this paper, to address the above challenge, we propose a novel layer-wise sampling strategy, which samples the nodes layer by layer conditionally based on the factors of the bi-directional diffusion between layers. In this way, we potentially restrict the time complexity linear to the number of layers, and construct a mini-batch of nodes with high local bi-directional influence (correlation). Further, we apply the self-attention mechanism to flexibly learn suitable weights for the sampled nodes, which allows the model to be able to incorporate both the first-order and higher-order proximities during a single layer propagation process without extra recursive propagation or skip connection. Extensive experiments on three large benchmark graphs demonstrate the effectiveness and efficiency of the proposed model.