Abstract: We propose a novel regularization method to effectively train a neural network for avoiding overfitting, thus improving the performance. The core idea is to bridge the gap between predictive distributions derived from two popular image mixture techniques Mixup and CutMix by an ensemble distribution in a class-wise manner. Consistent optimization towards these three distributions is conducted by mutual distillation to guide the model to alleviate over-confidence predictions and robustly learn discriminative features as the classification evidence. Experiments across various image classification tasks show that our method significantly achieves better performance than previous data augmentation Mixup+CutMix and Self-KD methods.
Loading