mixup: Beyond Empirical Risk Minimization

Hongyi Zhang; Moustapha Cisse; Yann N. Dauphin; David Lopez-Paz

mixup: Beyond Empirical Risk Minimization

Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz

15 Feb 2018 (modified: 22 Jun 2025)ICLR 2018 Conference Blind SubmissionReaders: Everyone

Abstract: Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.

TL;DR: Training on convex combinations between random training examples and their labels improves generalization in deep neural networks

Keywords: empirical risk minimization, vicinal risk minimization, generalization, data augmentation, image classification, generative adversarial networks, adversarial examples, random labels

Code: [![github](/images/github_icon.svg) facebookresearch/mixup-cifar10](https://github.com/facebookresearch/mixup-cifar10) + [![Papers with Code](/images/pwc_icon.svg) 70 community implementations](https://paperswithcode.com/paper/?openreview=r1Ddp1-Rb)

Data: [CIFAR-10](https://paperswithcode.com/dataset/cifar-10), [CIFAR-100](https://paperswithcode.com/dataset/cifar-100), [ImageNet-A](https://paperswithcode.com/dataset/imagenet-a), [ImageNet-W](https://paperswithcode.com/dataset/imagenet-w), [Kuzushiji-MNIST](https://paperswithcode.com/dataset/kuzushiji-mnist), [SVHN](https://paperswithcode.com/dataset/svhn), [UrbanCars](https://paperswithcode.com/dataset/urbancars)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 43 code implementations](https://www.catalyzex.com/paper/mixup-beyond-empirical-risk-minimization/code)

11 Replies

Loading