An Approach to Improving the Effectiveness of Data Augmentation for Deep Neural Networks

Seunghui Jang; Ki Yong Lee; Yanggon Kim

An Approach to Improving the Effectiveness of Data Augmentation for Deep Neural Networks

Seunghui Jang, Ki Yong Lee, Yanggon Kim

Published: 01 Jan 2020, Last Modified: 22 Jun 2025COMPSAC 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Nowadays, deep neural networks (DNNs) have many achievements in various fields like classification, clustering and regression. However, there is a major drawback: they are always affected by the dataset. Since the good quality and quantity of datasets were accompanied, DNNs were able to have much better performance. The evidence indicates that a good model should be supported by a well-organized dataset. In this respect, DNNs are also heavily influenced by the quality and quantity of the dataset used to train the model. Generally, many DNNs models achieve a great performance based on the assumption that the dataset represents all aspects of the problem well. However, in order to be applied in a real situation, the dataset may need to be modified or added for each specific application. Especially, it is difficult to add new classes into an already well-organized dataset. Adding data of completely new classes into the existing dataset is still a difficult challenge. In this paper we present a simple yet effective way to add data of new classes into the dataset especially when its amount is very small compared to the existing dataset. First, we have experimented on how performance is affected if a small amount of data of new classes comes into the dataset as it is. Then, in order to improve the performance of the model on that small amount of new data, we used an upsampling method to balance the newly added data and the original data. The upsampling method is one of the inflation methods used to balance data. The upsampling method maintains a balance of data with different amounts of objects so that less information is not forgotten. We describe and analyze our experiments on the MNIST (handwritten dataset) and MSCOCO (image captioning dataset) and explain how our upsampling approach can improve the performance of the models

Loading