Cost-effective Data Labelling for Graph Neural Networks

Shixun Huang; Ge Lee; Zhifeng Bao; Shirui Pan

Cost-effective Data Labelling for Graph Neural Networks

Shixun Huang, Ge Lee, Zhifeng Bao, Shirui Pan

Published: 23 Jan 2024, Last Modified: 23 May 2024TheWebConf24 OralEveryoneRevisionsBibTeX

Keywords: graph, algorithm, graph neural networks

Abstract: Active learning (AL), that aims to label limited data samples to effectively train the model, stands as a very cost-effective data labelling strategy in machine learning. Given the state-of-the-art performance GNNs have achieved in graph-based tasks, it is critical to design proper AL methods for graph neural networks (GNNs). However, existing GNN-based AL methods require considerable supervised information to guide the AL process, such as the GNN model to use, and initially labelled nodes and labels of newly selected nodes. Such dependency on supervised information limits both flexibility and scalabilty. In this paper, we propose an unsupervised, scalable and flexible AL method – it incurs low memory footprints and time cost, is flexible to the choice of underlying GNNs, and operates without requiring GNN-model-specific knowledge or labels of selected nodes. Specifically, we leverage the commonality of existing GNNs to reformulate the unsupervised AL problem as the Aggregation Involvement Maximization (AIM) problem. The objective of AIM is to maximize the involvement or participation of all nodes during the feature aggregation process of GNNs for nodes to be labelled. In this way, the aggregated features of labelled nodes can be diversified to a large extent, thereby benefiting the training of feature transformation matrices which are major trainable components in GNNs. We prove that the AIM problem is NP-hard and propose an efficient solution with theoretical guarantees. Extensive experiments on public datasets demonstrate the effectiveness, scalability and flexibility of our method. Our study is highly relevant to the track “Graph Algorithms and Modeling for the Web” since we focus one of the major listed topics "Graph Embedding and GNNs for the Web" and AL for GNNs, as an important research problem, is faced by aforementioned challenges to be tackled in this paper.

Track: Graph Algorithms and Learning for the Web

Submission Guidelines Scope: Yes

Submission Guidelines Blind: Yes

Submission Guidelines Format: Yes

Submission Guidelines Limit: Yes

Submission Guidelines Authorship: Yes

Student Author: No

Submission Number: 153

Loading