Components Beat Patches: Eigenvector Removal for Robust Masked Image Modelling

Alice Bizeul; Thomas M. Sutter; Alain Ryser; Julius von Kügelgen; Julia E Vogt

Components Beat Patches: Eigenvector Removal for Robust Masked Image Modelling

Alice Bizeul, Thomas M. Sutter, Alain Ryser, Julius von Kügelgen, Julia E Vogt

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Self-supervised Representation Learning; Unsupervised Representation Learning; Visual Representation Learning

TL;DR: We propose a novel masking strategy for Masked Image Modelling approaches; The proposed method operates on principal components rather than spatial patches leading to significant improvement on downstream image classification performance.

Abstract: Masked Image Modeling has gained prominence as a powerful self-supervised learning approach for visual representation learning by reconstructing masked-out patches of images. However, the use of random spatial masking can lead to failure cases in which the learned features are not predictive of downstream labels. In this work, we introduce a novel masking strategy that targets principal components instead of image patches. The learning task then amounts to reconstructing the information of masked-out principal components. The principal components of a dataset contain more global information than patches, such that the information shared between the masked input and the reconstruction target should involve more high-level variables of interest. This property allows principal components to offer a more meaningful masking space, which manifests in improved quality of the learned representations. We provide empirical evidence across natural and medical datasets and demonstrate substantial improvements in image classification tasks. Our method thus offers a simple and robust data-driven alternative to traditional Masked Image Modelling approaches.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10670

Loading