Heterogeneous multiviews-based efficient graph contrastive learning model for short text classification

Published: 2025, Last Modified: 31 Jul 2025Knowl. Based Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Short Text Classification (STC) is widely applied in various industrial scenarios. However, the limited semantic content of short texts and the scarcity of labeled data hinder accurate and efficient classification in practice. Recent studies have demonstrated the effectiveness of graph contrastive learning for text classification. Nevertheless, integrating external corpora to enrich semantics often introduces noise, and graph feature compression can further degrade the original semantic information–both of which negatively affect classification accuracy and robustness. To address these issues, we propose a novel Heterogeneous multiviews-based Efficient Graph Contrastive Learning model (HEGCL) for STC. First, we construct heterogeneous graphs from multiple information sources at the word, entity, and tag levels, preserving original semantics while mitigating external noise. Then, we generate enhanced feature views using a two-layer Graph Convolutional Network (GCN) and a main-term (MD) matrix derived from the original text, capturing diverse semantic aspects and alleviating information loss during feature compression. Finally, we perform multiview contrastive learning using three modules of GECL, NDCL, and CCL to improve representation learning. Extensive experiments on six real-world datasets demonstrate that HEGCL outperforms state-of-the-art (SOTA) methods in both classification accuracy and model robustness on STC tasks. Our code can be found in https://github.com/zkq454/HEGCL.
Loading