Prompt-Learning for Semi-supervised Text Classification

Published: 01 Jan 2023, Last Modified: 06 Feb 2025WISE 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In the Semi-Supervised Text Classification (SSTC) task, the performance of the SSTC-based models heavily rely on the accuracy of the pseudo-labels for unlabeled data, which is not practical in real-world scenarios. Prompt-learning has recently proved to be effective to alleviate the low accuracy problem caused by the limited label data in SSTC. In this paper, we present a Pattern Exploiting Training with Unsupervised Data Augmentation (PETUDA) method to address SSCT under limited labels setting. We first exploit the potential of the PLMs using prompt learning, convert the text classification task into a cloze-style task, and use the masked prediction ability of the PLMs to predict the categories. Then, we use a variety of data augmentation methods to enhance the model performance with unlabeled data, and introduce a consistency loss in the model training process to make full use of unlabeled data. Finally, we conduct extensive experiments on three text classification benchmark datasets. Empirical results show that PETUDA consistently outperforms the baselines in all cases.
Loading