Intra-Instance VICReg: Bag of Self-Supervised Image Patch Embedding Explains the Performance

Yubei Chen; Adrien Bardes; ZENGYI LI; Yann LeCun

Intra-Instance VICReg: Bag of Self-Supervised Image Patch Embedding Explains the Performance

Yubei Chen, Adrien Bardes, ZENGYI LI, Yann LeCun

22 Sept 2022 (modified: 04 Aug 2025)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: self-supervised learning, explainable machine learning, co-occurrence statistics modeling

TL;DR: We show that Siamese-network-based SSL methods essentially learn a distributed representation of image patches and aggregate them to form the instance representation.

Abstract: Recently, self-supervised learning (SSL) has achieved tremendous empirical advancements in learning image representation. However, our understanding and knowledge of the representation are still limited. This work shows that the success of the SOTA Siamese-network-based SSL approaches is primarily based on learning a distributed representation of image patches. In particular, we show that when we learn a representation only for fixed-scale image patches and aggregate different patch representations for an image (instance), it can achieve on par or even better results than the baseline methods on several benchmarks. Further, we show that the patch representation aggregation can also improve various SOTA baseline methods by a large margin. We also establish a formal connection between the Siamese-network-based SSL objective and the image patches co-occurrence statistics modeling, which supplements the prevailing invariance perspective. By visualizing the nearest neighbors of different image patches in the embedding space and projection space, we show that while the projection has more invariance, the embedding space tends to preserve more equivariance and locality. While it is important to push the SOTA engineering frontier, we show that it is also a promising direction to simplify the SOTA methods to build better understanding.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Unsupervised and Self-supervised learning

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/intra-instance-vicreg-bag-of-self-supervised/code)

4 Replies

Loading