Beyond tip of the Iceberg: Debiased Self-training for Long-tailed Semi-supervised Node Classification

Zhixun Li; Dingshuo Chen; Tong Zhao; Daixin Wang; Hongrui Liu; Zhiqiang Zhang; JUN ZHOU; Jeffrey Xu Yu

Beyond tip of the Iceberg: Debiased Self-training for Long-tailed Semi-supervised Node Classification

Zhixun Li, Dingshuo Chen, Tong Zhao, Daixin Wang, Hongrui Liu, Zhiqiang Zhang, JUN ZHOU, Jeffrey Xu Yu

Published: 29 Jan 2025, Last Modified: 29 Jan 2025WWW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Graph algorithms and modeling for the Web

Keywords: Graph Neural Networks, Self-training, Class-imbalanced

TL;DR: We propose a debiased self-training framework to enhance the class-imbalanced and few-shot learning abilities of graph neural networks.

Abstract: Graph Neural Networks (GNNs) have achieved great success in dealing with non-Euclidean graph-structured data and have been widely deployed in many real-world applications. However, their effectiveness is often jeopardized under class-imbalanced training sets. Most existing studies have analyzed class-imbalanced node classification from a supervised learning perspective, they do not fully utilize the large number of unlabeled nodes in semi-supervised scenarios. We claim that the supervised signal is just the tip of the iceberg and a large number of unlabeled nodes have not yet been effectively utilized. In this work, we propose \texttt{IceBerg}, a debiased self-training framework to address the class-imbalanced and few-shot challenges for GNNs at the same time. Specifically, to figure out the Matthew effect and label distribution shift in self-training, we propose \texttt{Double Balancing}, which can largely improve the performance of existing baselines with just a few lines of code as a simple plug-and-play module. Secondly, to enhance the long-range propagation capability of GNNs, we disentangle the propagation and transformation operations of GNNs. Therefore, the weak supervision signals can propagate more effectively to address few shot issue. In summary, we find that leveraging unlabeled nodes can significantly enhance the performance of GNNs in class-imbalanced and few shot scenarios, and even small, surgical modifications can lead to substantial performance improvements. Systematic experiments on benchmark datasets show that our method can deliver considerable performance gain over existing class-imbalanced node classification baselines. Additionally, due to \texttt{IceBerg}'s outstanding ability to leverage unsupervised signals, it also achieves state-of-the-art results in few shot node classification scenarios. The code of \texttt{IceBerg} is available at: \url{https://anonymous.4open.science/r/IceBerg-D865/}.

Submission Number: 1742

Loading