Abstract: Graph Neural Networks (GNNs) demonstrate remarkable learning efficacy on graph-structured data across various real-world domains. However, due to large-scale graph datasets, it is impractical to train the complete graph on a single GPU. Subgraph-level parallel training has proven to be an effective approach for GNN training on large-scale graphs. Nonetheless, this approach presents certain issues: 1. Due to the absence of neighboring vertices at the boundaries, gradient estimation during the training process may exhibit systemic bias, potentially compromising training accuracy; 2. Existing training frameworks often neglect to consider computational capability differences among trainers when allocating the workloads. We propose GALA-GNN to alleviate the aforementioned issues: 1. We introduce a heuristic edge partitioning algorithm, Neighbor Expansion Simulated Annealing, i.e., NESA, which minimizes the number of vertices with missing neighborhoods in subgraphs. Additionally, it allows for the division of the original graph into subgraphs with uneven workloads by setting a threshold manually; 2. We present Computation-Aware Subgraph Enlarge, which not only prevents significant loss in training accuracy but also reduces idle time for high-computation trainers during the training process, thereby enhancing overall computational resource utilization. We compare our proposed GALA-GNN with the current state-of-the-art GNN training framework, DGL. On the medium-scale dataset Flickr, GALA-GNN achieves up to 7.63x speedup, and on the large-scale dataset ogbn-products, it achieves up to 4.84x speedup, without causing significant loss in training accuracy.
Loading