Model Degradation Hinders Deep Graph Neural Networks

Wentao Zhang, Zeang Sheng, Ziqi Yin, Yuezihan Jiang, Yikuan Xia, Jun Gao, Zhi Yang, Bin Cui

2022 (modified: 16 Nov 2022)KDD 2022Readers: Everyone

Abstract: Graph Neural Networks (GNNs) have achieved great success in various graph mining tasks. However, drastic performance degradation is always observed when a GNN is stacked with many layers. As a result, most GNNs only have shallow architectures, which limits their expressive power and exploitation of deep neighborhoods. Most recent studies attribute the performance degradation of deep GNNs to the over-smoothing issue. In this paper, we disentangle the conventional graph convolution operation into two independent operations: Propagation (P) and Transformation (T). Following this, the depth of a GNN can be split into the propagation depth (Dp) and the transformation depth (Dt). Through extensive experiments, we find that the major cause for the performance degradation of deep GNNs is the model degradation issue caused by large Dt rather than the over-smoothing issue mainly caused by large Dp. Further, we present Adaptive Initial Residual (AIR), a plug-and-play module compatible with all kinds of GNN architectures, to alleviate the model degradation issue and the over-smoothing issue simultaneously. Experimental results on six real-world datasets demonstrate that GNNs equipped with AIR outperform most GNNs with shallow architectures owing to the benefits of both large DD_p$ and Dt, while the time costs associated with AIR can be ignored.

0 Replies