Abstract: As the need of large amount of time and expertise to obtain enough labeled data, semi-supervised learning has received much attention to utilize both labeled and unlabeled data. In this paper, we present SeRe: a Sentence Recombination method to augment training data for semi-supervised text classification. SeRe makes full use of the similarities between sentences in different samples through the grouping and recombining process to form rich and varied training data. SeRe generates data from three combinations, including labeled, unlabeled, and mixed data. Meanwhile, SeRe combines the self-training framework to improve the quality of augmented training data iteratively. We apply SeRe to text classification tasks and conduct extensive experiments on four publicly available benchmarks. Experimental results show that SeRe achieves new state-of-the-art performances on all of them.
0 Replies
Loading