Augmentation Alone Leads to Generalization

Runtian Zhai; Bingbin Liu; Andrej Risteski; J Zico Kolter; Pradeep Kumar Ravikumar

Augmentation Alone Leads to Generalization

Runtian Zhai, Bingbin Liu, Andrej Risteski, J Zico Kolter, Pradeep Kumar Ravikumar

Published: 05 Mar 2024, Last Modified: 08 May 2024ICLR 2024 R2-FM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: generalization theory, foundation models, data augmentation

TL;DR: We show that a sufficiently good data augmentation technique alone can lead to good generalization, and prove bounds for an arbitrary encoder.

Abstract: We study self-supervised representation learning with data augmentation, such as contrastive learning and masked image/language modeling. Our main result is that a sufficiently good data augmentation technique alone can lead to good generalization, for which we prove generalization bounds for an arbitrary encoder with a model-free analysis. Our results model the upstream stage as RKHS approximation and the downstream stage as RKHS regression, where the RKHS is fully determined by the augmentation. We identify augmentation complexity as a key ingredient that replaces the model complexity and additionally use it to quantitatively analyze augmentations on real datasets.

Submission Number: 9

Loading