Semi-Supervised Coarsening of Bipartite Graphs for Text Classification via Graph Neural Network

Nícolas Roque dos Santos; Diego Minatel; Alan Demétrius Baria Valejo; Alneu de Andrade Lopes

Semi-Supervised Coarsening of Bipartite Graphs for Text Classification via Graph Neural Network

Nícolas Roque dos Santos, Diego Minatel, Alan Demétrius Baria Valejo, Alneu de Andrade Lopes

Published: 01 Jan 2024, Last Modified: 05 Feb 2025DSAA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Graph Neural Networks (GNNs) have recently received extensive attention due to their applicability in a wide range of tasks, including drug discovery, text classification, traffic forecasting, hardware design, and recommendation. However, GNNs face significant challenges regarding scalability and the ability to handle large-scale graphs. Several strategies have been proposed to address these challenges, with multilevel optimization being a prominent approach. This technique involves hierarchically generating compact graphs through a coarsening step, applying a target algorithm (e.g., community detection) to the coarsest graph, and then projecting the initial solution back to the original input to derive the final solution. In this work, we introduce a method for graph-based text classification using GNNs. Our approach involves generating ten smaller graphs from an input bipartite graph using the coarsening step within the multilevel optimization and applying a GNN to learn node representations at various levels of granularity. Moreover, we propose a novel semi-supervised coarsening algorithm called Greedy Sorted Matching using Class and Split Information for Bipartite Graphs (GMCb). GMCb leverages class and train-test split information to select document nodes to merge during the graph coarsening step. We perform three types of reductions by either coarsening only one of the partitions of the graph or both simultaneously. Our method is evaluated on eight diverse datasets using three different GNN architectures. We assess each model's performance, memory usage, and training time to understand the impacts of graph reduction. Our experiments demonstrate that contracting the document nodes can improve performance while reducing memory consumption and training time.

Loading