LOPS: Learning Order Inspired Pseudo-Label Selection for Weakly Supervised Text ClassificationDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Iterative self-training is a popular framework in weakly supervised text classification that involves bootstrapping a deep neural classifier from heuristic pseudo-labels. The quality of pseudo-labels, especially the initial ones, is crucial to final performance but they are inevitably noisy due to their heuristic nature, so selecting the correct ones has a huge potential for performance boost. One straightforward solution is to select samples based on the softmax probability scores corresponding to their pseudo-labels. However, we show through our experiments that such methods are ineffective and unstable due to the erroneously high-confidence predictions from poorly calibrated models. Recent studies on the memorization effects of deep neural models suggest that these models first memorize training samples with clean labels and then those with noisy labels. Inspired by this observation, we propose a novel pseudo-label selection method LOPS that takes learning order of samples into consideration. We hypothesize that the learning order reflects the probability of wrong annotation in terms of ranking, and therefore, select the top samples that are learnt earlier. LOPS can be viewed as a strong performance-boost plug-in to most of existing weakly-supervised text classification methods, as confirmed in extensive experiments on six real-world datasets.
0 Replies

Loading