When Covariate-shifted Data Augmentation Increases Test Error And How to Fix It

Sang Michael Xie*; Aditi Raghunathan*; Fanny Yang; John C. Duchi; Percy Liang

When Covariate-shifted Data Augmentation Increases Test Error And How to Fix It

Sang Michael Xie, Aditi Raghunathan, Fanny Yang, John C. Duchi, Percy Liang

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Abstract: Empirically, data augmentation sometimes improves and sometimes hurts test error, even when only adding points with labels from the true conditional distribution that the hypothesis class is expressive enough to fit. In this paper, we provide precise conditions under which data augmentation hurts test accuracy for minimum norm estimators in linear regression. To mitigate the failure modes of augmentation, we introduce X-regularization, which uses unlabeled data to regularize the parameters towards the non-augmented estimate. We prove that our new estimator never hurts test error and exhibits significant improvements over adversarial data augmentation on CIFAR-10.

Keywords: data augmentation, adversarial training, interpolation, overparameterized

Original Pdf: pdf

9 Replies

Loading

When Covariate-shifted Data Augmentation Increases Test Error And How to Fix It

Sang Michael Xie*, Aditi Raghunathan*, Fanny Yang, John C. Duchi, Percy Liang

Sang Michael Xie, Aditi Raghunathan, Fanny Yang, John C. Duchi, Percy Liang