2PGraph: Accelerating GNN Training over Large Graphs on GPU Clusters

Lizhi Zhang, Zhiquan Lai, Shengwei Li, Yu Tang, Feng Liu, Dongsheng Li

2021 (modified: 01 Nov 2022)CLUSTER 2021Readers: Everyone

Abstract: Graph neural networks (GNNs) have been emerging as powerful learning tools for unstructured data and successfully applied to many graph-based application domains. Sampling-based graph training is commonly used in existing GNN training frameworks to handle large-scale graphs. However, this type of approach is restricted by the problems of long memory accessing latency, neighborhood explosion during mini-batch sampling, and inefficient loading of vertex features from CPU to GPU. In this paper, we propose 2PGraph, a system that supports high-speed locality-aware mini-batch sampling and GNN layer-aware feature caching. 2PGraph significantly reduces sampling time by vertex-cluster sampling, which improves the locality of vertex access and limits the range of neighborhood expansion. To further reduce the sampling time in a distributed environment, we renumber the vertex numbers in subgraphs after graph partition, which improves the data locality of each partition. 2PGraph also avoids abundant data transfer between CPU and GPU through the feature data caching on available GPU resources with a hit rate of 100%. Furthermore, 2PGraph develops a GNN layer-aware feature caching policy during data parallel training and achieves better cache efficiency and memory utilization. We evaluate 2PGraph against two state-of-the-art industrial GNN frameworks, i.e., PyG and DGL, on a diverse array of benchmarks. Experimental results show that 2PGraph reduces up to 90% mini-batch sampling and 99% data loading time, and achieves up to 8.7 × performance speedup over the state-of-the-art baselines on an 8-GPU cluster.

0 Replies