Abstract: Text augmentation techniques are widely used in text classification problems to improve the performance of classifiers, especially in low-resource scenarios. Previous text-editing-based methods augment the text in a non-selective manner: the words in the text are treated without difference during augmentation, which may result in unsatisfactory augmented samples. In this work, we present four kinds of roles of words (ROWs) which have different functions in text classification tasks, and design effective methods to automatically extract these ROWs based on statistical and semantic perspectives. Systematic experiments are conducted on what ROWs should (n't) be augmented during augmentation for classification tasks. Based on these experiments, we discover some interesting and instructive potential patterns that certain ROWs are especially suitable or unsuitable for certain augmentation operations. Guided by these patterns, we propose a set of Selective Text Augmentation (STA) operations, which significantly outperform traditional methods and show outstanding generalization performance.
0 Replies
Loading