PLIClass: Weakly Supervised Text Classification with Iterative Training and Denoisy Inference

Published: 2024, Last Modified: 02 Aug 2025ICANN (7) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Weakly supervised text classification leverages only label class names as signals to train classifiers. Most existing methods rely on various techniques such as keyword-driven or clustering methods to generate pseudo-labels for iterative training. However, acquiring diversified samples during the iterative process and mitigating error accumulation remain significant challenges. In this paper, we propose PLIClass, a pseudo-labeling-driven iterative framework for weakly supervised text classification. The framework consists of two modules: (1) a diversified pseudo-label acquisition module that integrates clustering and sampling techniques to procure a dataset that is both accurate and varied, and (2) a noise-resistant inference module that employs these pseudo-labels to facilitate joint model inference, thereby mitigating the error accumulation typically associated with the training process. PLIClass demonstrates superior performance compared to strong baselines on five benchmark datasets, and approaching the performance of methods that rely on limited labeled samples.
Loading