Keywords: Data Augmentations, out-of-domain, Stochasticity, Flatness, Neural Networks, Invariance
TL;DR: We uncover mechanisms by which data augmentations regularize training, the relationship between augmented views and extra data, invariance, stochasticity, and flatness, also in the face of distribution shifts.
Abstract: Despite the clear performance benefits of data augmentations, little is known about why they are so effective. In this paper, we disentangle several key mechanisms through which data augmentations operate. Establishing an exchange rate between augmented and additional real data, we find that augmentations can provide nearly the same performance gains as additional data samples for in-domain generalization and even greater performance gains for out-of-distribution test sets. We also find that neural networks with hard-coded invariances underperform those with invariances learned via data augmentations. Our experiments suggest that these benefits to generalization arise from the additional stochasticity conferred by randomized augmentations, leading to flatter minima.