Abstract: Mixup linearly interpolates pairs of examples to form new samples, which has been shown to be effective in image classification tasks. However, there are two drawbacks in mixup: one is that more training epochs are needed to obtain a well-trained model; the other is that mixup requires tuning a hyper-parameter to gain appropriate capacity. In this paper, we find that mixup constantly explores the representation space, and inspired by the exploration-exploitation dilemma, we propose mixup Without hesitation (mWh), a concise and effective training algorithm. We show that mWh strikes a good balance between exploration and exploitation by gradually replacing mixup with basic data augmentation. It can achieve a strong baseline with less training time than original mixup and without searching for optimal hyper-parameter, i.e., mWh acts as mixup without hesitation.
0 Replies
Loading