NeutronHeter: Optimizing Distributed Graph Neural Network Training for Heterogeneous Clusters

Chunyu Cao, Xin Ai, Qiange Wang, Yanfeng Zhang, Zhenbo Fu, Hao Yuan, Mingyi Cao, Chaoyi Chen, Yingyou Wen, Yu Gu, Ge Yu

Published: 2025, Last Modified: 02 Feb 2026Proc. ACM Manag. Data 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Distributed training is critical for scaling graph neural networks (GNNs) to large graphs. However, existing systems struggle with efficient GNN training in heterogeneous clusters due to two main challenges. First, mapping GNN workloads to heterogeneous clusters involves matching computation and communication heterogeneity while minimizing communication overhead, resulting in a non-trivial multi-constrained multi-way workload mapping problem. Second, the positive correlation between communication and computational workloads of GNN makes workload mapping difficult, especially in heterogeneous clusters with asymmetric communication bandwidth and computational power (e.g., high-end GPUs with low-bandwidth networks).In this paper, we present NeutronHeter, an efficient GNN training system for heterogeneous clusters. First, we adopt a multi-level workload mapping framework. It converts the original multi-constrained multi-way workload mapping into a more efficient top-down workload mapping on a tree-like resource graph, which is constructed using hierarchical clustering based on computing power and bandwidth. By iteratively mapping the workload layer by layer, our design achieves a near-optimal solution with low overhead. Second, we adopt an adaptive communication migration strategy that reduces communication over slow connections by replicating critical remote vertices on these connections and managing their communication through faster connections. This approach significantly reduces high communication overhead in asymmetric heterogeneous clusters with low-bandwidth links and high computational power. Experimental results in heterogeneous clusters show that NeutronHeter achieves a speedup ranging from 1.06x to 33.05x compared to SOTA frameworks.

External IDs:dblp:journals/pacmmod/Cao0W0FYCCW0025