Keywords: data augmentation, generalization, image classification
Abstract: Data augmentation by mixing samples, such as Mixup, has widely been used typically for classification tasks. However, this strategy is not always effective due to the gap between augmented samples during training and clean samples. This gap may prevent a classifier from learning the optimal decision boundary and increases the generalization error. To overcome this problem, we propose an alternative framework called Data Interpolating Prediction (DIP). Unlike common data augmentations, we encapsulate the sample-mixing process in the hypothesis class of a classifier so that train and test samples are treated equally. We derive the generalization bound and show that DIP reduces the original Rademacher complexity. Also, we empirically demonstrate that DIP can outperform existing Mixup.