Enhancing Graph Transformer Training through Adaptive Graph Parallelism

Published: 01 Jan 2025, Last Modified: 11 Sept 2025IPDPS (Workshops) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Graph Transformers, a variant of Graph Neural Networks (GNNs), excel at capturing long-range dependencies but struggle with scalability due to the quadratic complexity of their attention mechanism. We introduce a new training framework that optimizes parallelization strategies based on the graph structure and system configuration. By using sparse operations like sparse matrix-matrix multiplication (SpMM) and sampled dense-dense matrix multiplication (SDDMM), we enhance sparse graph attention speed by up to 3.8x and cut memory use by 77.6% compared to leading frameworks. Additionally, we implement a lightweight reordering strategy for balanced workloads. Our method efficiently processes large-scale graphs with significant scalability improvements, achieving a 5.8x speedup on the ogbn-proteins dataset and a 3.7x speedup on the ogbn-products dataset in distributed training, surpassing previous parallelization methods.
Loading