A Text Data Augmentation Approach for Improving the Performance of CNN

Muhammad Abulaish, Amit Kumar Sah

2019 (modified: 01 Nov 2022)COMSNETS 2019Readers: Everyone

Abstract: Deep learning is an emerging research area in the field of machine learning and its fascinating accuracy has attracted many researchers to apply it in various domains, including computer vision and natural language processing. Traditional machine learning approaches require to invest a good amount of time in feature engineering and related tasks. Deep learning, on the other hand, does not require to define features explicitly; instead, it aims to learn different representations from data automatically. However, it needs large enriched corpus to train deep classification models properly. Overfitting is another challenging issue, which needs enriched corpus. In this paper, we propose a data augmentation approach, which combines n-grams and LDA techniques to identify class-specific phrases to enrich the underlying corpus. We have evaluated the performance of the convolutional neural network on both original and augmented corpus, and it is found that the augmented corpus has lower variance and better validation accuracy in comparison to the original corpus. The proposed data augmentation approach seems very useful for the domains with small data corpus to train deep learning models.

0 Replies