TextSMatch: Safe Semi-supervised Text Classification with Domain Adaption

Yibin Xu, Ge Lin, Nanli Zeng, Yingying Qu, Kun Zeng

Published: 01 Jan 2022, Last Modified: 19 May 2025NCAA (1) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The performance of many efficient Deep semi-supervised learning(SSL) is severely degraded when the distribution of unlabeled and labeled data does not match. Some recent approaches have chosen to weaken or even remove out-of-distribution (OOD) data, which can lose the potential value of OOD data. We propose TextSMatch to solve this issue, a simple, safe and effective SSL method for text classification, which recycles the OOD data near the labeled domain to make full use of the information in OOD data. Specifically, adversarial domain adaptation is applied to the OOD data to project it into the space of ID and labeled data, and its recoverability is assessed using the use of migration probabilities. Moreover, TextSMatch unifies the mainstream methods. In addition to consistency regularization training of class probabilities for unlabeled data and its augmented data, we also normalized the structure of embedding with contrastive learning based on pseudo-labeled. TextSMatch performs significantly better than other baseline methods on AG News and Yelp datasets in scenarios such as class mismatch and different amounts of labeled data.