Mind Your Augmentation: The Key to Decoupling Dense Self-Supervised Learning

Congpei Qiu; Tong Zhang; Yanhao Wu; Wei Ke; Mathieu Salzmann; Sabine Süsstrunk

Mind Your Augmentation: The Key to Decoupling Dense Self-Supervised Learning

Congpei Qiu, Tong Zhang, Yanhao Wu, Wei Ke, Mathieu Salzmann, Sabine Süsstrunk

Published: 16 Jan 2024, Last Modified: 13 Mar 2024ICLR 2024 posterEveryoneRevisionsBibTeX

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Dense Self-supervised Learning, representation learning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: we are providing novel and practical solution for enhancing Dense SSL, offering valuable insights into disentangled feature representation within the realm of self-supervised learning.

Abstract: Dense Self-Supervised Learning (SSL) creates positive pairs by building positive paired regions or points, thereby aiming to preserve local features, for example of individual objects. However, existing approaches tend to couple objects by leaking information from the neighboring contextual regions when the pairs have a limited overlap. In this paper, we first quantitatively identify and confirm the existence of such a coupling phenomenon. We then address it by developing a remarkably simple yet highly effective solution comprising a novel augmentation method, Region Collaborative Cutout (RCC), and a corresponding decoupling branch. Importantly, our design is versatile and can be seamlessly integrated into existing SSL frameworks, whether based on Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs). We conduct extensive experiments, incorporating our solution into two CNN-based and two ViT-based methods, with results confirming the effectiveness of our approach. Moreover, we provide empirical evidence that our method significantly contributes to the disentanglement of feature representations among objects, both in quantitative and qualitative terms.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 945

Loading