NC-ALG: Graph-Based Active Learning Under Noisy Crowd

Published: 2024, Last Modified: 11 Mar 2026ICDE 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Graph Neural Networks (GNNs) have achieved great success in various data mining tasks but they heavily rely on a large number of annotated nodes, requiring considerable human efforts. Despite the effectiveness of existing GNN-based Active Learning (AL) methods, they assume that the annotated labels are always correct, which is contradictory to the error-prone labeling process in a practical crowdsourcing environment. Besides, due to this impractical assumption, existing works only focus on optimizing the node selection in AL but neglect optimizing the labeling process. Therefore, we present NC-ALG, the first GNN-based AL framework that optimizes both the node selection and node labeling process under a noisy crowd. For node selection, NC-ALG introduces a new measurement to model influence reliability and an effective influence maximization objective to select nodes. For node labeling, NC-ALG significantly reduces the labeling cost by considering the model-predicted labels and the labels of mirror nodes. To the best of our knowledge, this is the first attempt to consider GNN-based AL under the practical noisy crowd. Empirical studies on public datasets demonstrate that NC-ALG significantly outperforms existing methods in terms labeling efficiency. Notably, it only takes NC-ALG one-third of the labeling budget that the competitive baseline GRAIN needs to achieve an accuracy of 70.7 % on PubMed.
Loading