Abstract: This paper investigates a change of approach to textual data augmentation for sentiment classification, by switching from offline to online data modification. In other words, from changing the data before the training is started to using transformed samples during the training process. This allows utilizing the information about the current loss of the classifier. We try training with examples that maximize, minimize the loss, or are randomly sampled. We observe that the maximizing variant performs best in most cases. We use 2 neural network architectures, 3 data augmentation methods, and test them on 4 different datasets. Our experiments indicate that the switch to the online data augmentation improves the results for recurrent neural networks in all cases and for convolutional networks in some cases. The improvement reaches 2.3% above the baseline in terms of accuracy, averaged over all datasets, and 2.25% on one of the datasets, but averaged over dataset sizes.
0 Replies
Loading