Abstract: Distributed full-graph training of Graph Neural Networks (GNNs) has been widely adopted to learn large-scale graphs. While recent system advancements can improve the training throughput of GNNs, their practical adoption is limited by the potential accuracy decline. This concern is particularly prominent in deeper and more intricate GNN architectures, where noticeable performance degradation becomes apparent. Moreover, existing works fail to comprehensively consider diverse opportunities for acceleration. Motivated by these deficiencies, we propose Sylvie,a full-graph training system that not only improves the training throughput substantially but also maintains the model quality for universal GNNs. By harnessing the inherent information embedded in the graph data and model structure, Sylvie intelligently optimizes GNN training across three key dimensions: data, time, and execution. It identifies performance-relevant features of the input graph offline as subsequent optimization guidance. Subsequently, Sylvie devises an online convergence-maintenance strategy that adaptively integrates and aligns GNN-specific quantization and inter-epoch asynchronous training with the real-time training characteristics. Extensive experiments demonstrate that Sylvie surpasses existing GNN training systems by up to 17.2× speedup for both shallow and deep GNNs, without compromising the model accuracy.
Loading