For self-supervised learning, Rationality implies generalization, provably

Yamini Bansal; Gal Kaplun; Boaz Barak

For self-supervised learning, Rationality implies generalization, provably

Yamini Bansal, Gal Kaplun, Boaz Barak

Published: 12 Jan 2021, Last Modified: 22 Jun 2025ICLR 2021 PosterReaders: Everyone

Keywords: Deep Learning Theory, Generalization Bounds, Self-supervised learning, Representation learning

Abstract: We prove a new upper bound on the generalization gap of classifiers that are obtained by first using self-supervision to learn a representation $r$ of the training~data, and then fitting a simple (e.g., linear) classifier $g$ to the labels. Specifically, we show that (under the assumptions described below) the generalization gap of such classifiers tends to zero if $\mathsf{C}(g) \ll n$, where $\mathsf{C}(g)$ is an appropriately-defined measure of the simple classifier $g$'s complexity, and $n$ is the number of training samples. We stress that our bound is independent of the complexity of the representation $r$. We do not make any structural or conditional-independence assumptions on the representation-learning task, which can use the same training dataset that is later used for classification. Rather, we assume that the training procedure satisfies certain natural noise-robustness (adding small amount of label noise causes small degradation in performance) and rationality (getting the wrong label is not better than getting no label at all) conditions that widely hold across many standard architectures. We also conduct an extensive empirical study of the generalization gap and the quantities used in our assumptions for a variety of self-supervision based algorithms, including SimCLR, AMDIM and BigBiGAN, on the CIFAR-10 and ImageNet datasets. We show that, unlike standard supervised classifiers, these algorithms display small generalization gap, and the bounds we prove on this gap are often non vacuous.

One-sentence Summary: We prove and empirically demonstrate generalization bounds for algorithms that fit a simple classifier on a representation learned via self-supervision; we obtain non-vacuous bounds for such top-performing algorithms on both CIFAR-10 and ImageNet.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Code: [![github](/images/github_icon.svg) ICLR2021-rep-gen/Rationality-Generalization](https://github.com/ICLR2021-rep-gen/Rationality-Generalization) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=Srmggo3b3X6)

Data: [CIFAR-10](https://paperswithcode.com/dataset/cifar-10), [ImageNet](https://paperswithcode.com/dataset/imagenet)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/for-self-supervised-learning-rationality/code)

13 Replies

Loading