Distributed Training of Large Graph Neural Networks with Variable Communication Rates

Distributed Training of Large Graph Neural Networks with Variable Communication Rates

TMLR Paper2004 Authors

03 Jan 2024 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Training Graph Neural Networks (GNNs) on large graphs presents unique challenges due to the large memory and computing requirements. Distributed GNN training, where the graph is partitioned across multiple machines, is a common approach to training GNNs on large graphs. However, as the graph cannot generally be decomposed into small non-interacting components, data communication between the training machines quickly limits training speeds. Compressing the communicated node activations by a fixed amount improves the training speeds, but lowers the accuracy of the trained GNN. In this paper, we introduce a variable compression scheme for reducing the communication volume in distributed GNN training without compromising the accuracy of the learned model. Based on our theoretical analysis, we derive a variable compression method that converges to a solution that is equivalent to the full communication case. Our empirical results show that our method attains a comparable performance to the one obtained with full communication and that for any communication budget, we outperform full communication at any fixed compression ratio.

Submission Length: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=9EStbgqIvs

Changes Since Last Submission: The previous version was desk rejected due to incorrect format: "Margins appear shifted from template defaults, please revisit and resubmit." The format was corrected.

Assigned Action Editor: ~Yujia_Li1

Submission Number: 2004

Loading