Abstract: Mixup is a popular regularization technique for training deep neural networks that improves generalization and increases adversarial robustness. It perturbs input training data in the direction of other randomly-chosen instances in the training set. To better leverage the structure of the data, we extend mixup in a simple, broadly applicable way to $k$-mixup, which perturbs $k$-batches of training points in the direction of other $k$-batches. The perturbation is done with displacement interpolation, i.e. interpolation under the Wasserstein metric. We demonstrate theoretically and in simulations that $k$-mixup preserves cluster and manifold structures, and we extend theory studying the efficacy of standard mixup to the $k$-mixup case. Our empirical results show that training with $k$-mixup further improves generalization and robustness across several network architectures and benchmark datasets of differing modalities. It generally produces similar performance gains over standard mixup as those seen by mixup itself over standard ERM.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Yann_Dauphin1
Submission Number: 532
Loading