Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression

Runtian Zhai; Bingbin Liu; Andrej Risteski; J Zico Kolter; Pradeep Kumar Ravikumar

Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression

Runtian Zhai, Bingbin Liu, Andrej Risteski, J Zico Kolter, Pradeep Kumar Ravikumar

Published: 16 Jan 2024, Last Modified: 05 Mar 2024ICLR 2024 spotlightEveryoneRevisionsBibTeX

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Learning Theory, Representation Learning, Self-supervised Learning, Data Augmentation, RKHS Approximation, RKHS Regression

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We establish an RKHS approximation/regression framework for analyzing self-supervised pretraining based on data augmentation, and derive nonparametric learning guarantees that disentangles the effects of the model and the augmentation.

Abstract: Data augmentation is critical to the empirical success of modern self-supervised representation learning, such as contrastive learning and masked language modeling. However, a theoretical understanding of the exact role of the augmentation remains limited. Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator, suggesting that learning a linear probe atop such representation can be connected to RKHS regression. Building on this insight, this work delves into a statistical analysis of augmentation-based pretraining. Starting from the isometry property, a geometric characterization of the target function given by the augmentation, we disentangle the effects of the model and the augmentation, and prove two generalization bounds that are free of model complexity. Our first bound works for an arbitrary encoder, and it is the sum of an estimation error bound incurred by fitting a linear probe, and an approximation error bound by RKHS approximation. Our second bound specifically addresses the case where the encoder extracts the top-d eigenspace of a finite-sample-based approximation of the underlying RKHS. A key ingredient in our analysis is the *augmentation complexity*, which we use to quantitatively compare different augmentations and analyze their impact on downstream performance.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 3894

Loading