Turning A Curse into A Blessing: Data-Aware Memory-Efficient Training of Graph Neural Networks by Dynamic Exiting

Yan Han; Kaiqi Chen; Shan Li; Ji Yan; Baoxu Shi; Lei Zhang; Fei Chen; Jaewon Yang; Yunpeng Xu; Xiaoqiang Luo; Qi He; Ying Ding; Zhangyang Wang

Turning A Curse into A Blessing: Data-Aware Memory-Efficient Training of Graph Neural Networks by Dynamic Exiting

Yan Han, Kaiqi Chen, Shan Li, Ji Yan, Baoxu Shi, Lei Zhang, Fei Chen, Jaewon Yang, Yunpeng Xu, Xiaoqiang Luo, Qi He, Ying Ding, Zhangyang Wang

Published: 01 Jan 2024, Last Modified: 08 Mar 2025WWW (Companion Volume) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Training Graph Neural Networks (GNNs) efficiently remains a challenge due to the high memory demands, especially during recursive neighborhood aggregation. Traditional sampling-based GNN training methods often overlook the data's inherent structure, such as the power-law distribution observed in most real-world graphs, which results in inefficient memory usage and processing. We introduce a novel framework, M emory-A ware D ynamic E xiting GNN (MADE-GNN )), which capitalizes on the power-law nature of graph data to enhance training efficiency. MADE-GNN is designed to be data-aware, dynamically adjusting the depth of feature aggregation based on the connectivity of each node. Specifically, it routes well-connected "head'' nodes through extensive aggregation while allowing sparsely connected "tail'' nodes to exit early, thus reducing memory consumption without sacrificing model performance. This approach not only addresses the challenge of memory-intensive GNN training but also turns the power-law distribution from a traditional "curse'' into a strategic "blessing''. By enabling partial weight sharing between the early-exit mechanism and the full model, MADE-GNN effectively improves the representation of cold-start nodes, leveraging the structural information from head nodes to enhance generalization across the network. Our extensive evaluations across multiple public benchmarks, including industrial-level graphs, show that MADE-GNN outperforms existing GNN training methods in both memory efficiency and performance, offering significant improvements particularly for tail nodes. This demonstrates MADE-GNN's potential as a versatile solution for GNN applications facing similar scalability and distribution challenges.

Loading