Keywords: regularization, generalization, entropy, adversarial
TL;DR: Maximizing entropy on adversarial examples can improve generalization
Abstract: Supervised classification methods that directly optimize maximize the likelihood of the training data often overfit. This overfitting is typically mitigated through regularizing the loss function (e.g., label smoothing, weight decay) or by minimizing the same loss on new examples (e.g., data augmentation, adversarial training). In this work, we propose a complementary regularization strategy: training the model to be unconfident on examples that are generated so they have unclear labels. We call our approach Maximum Predictive Entropy (MPE). These automatically generated examples are cheap to compute, so our method is only 30% slower than standard data augmentation. Adding MPE to existing regularization techniques, such as label smoothing, increases test accuracy by 1-3%, with larger gains in the small data regime.