Abstract: Graph neural networks (GNNs) are a powerful approach for machine learning on graph datasets. Such datasets often consist of millions of modestly-sized graphs, making them well-suited for data-parallel training. However, existing methods show poor scaling due to load imbalances and kernel overheads. We propose an optimized 2D scatter-gather based represen-tation of GNNs that is amenable to distributed, data-parallel training without changing the underlying mathematics of the GNN. By padding graph data to a fixed size on each process, we can simplify data ingestion, make use of efficient compute kernels, equally distribute computation load, and reduce overheads. We benchmark edge-conditioned GNNs with the PCQM4M-LSC and OGB-PPA datasets. Our implementation shows better runtime performance than the state-of-the-art, with a <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$12\times$</tex> strong-scaling speedup on 16 GPUs and an <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$89.4\times\ \text{weak}$</tex> -scaling speedup on 100 GPUs.
0 Replies
Loading