Image Compression Is an Effective Objective for Visual Representation Learning

20 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Visual Pre-training, Self-supervised Learning, Image Compression, Kolmogorov Complexity
TL;DR: A novel visual pre-training methodology based on image data compression, which is degradation-free and applies to various vision transformers
Abstract: Self-supervised pre-training is an effective method for initializing the weights of vision transformers. In this paper, we advocate for a novel learning objective that trains the target model to use a minimal number of tokens to reconstruct images. Compared to the existing approaches including contrastive learning (CL) and masked image modeling (MIM), our formulation not only offers a new perspective of visual pre-training from the information theory, but also alleviates the degradation dilemma which may lead to instability. The idea is implemented using Semantic Merging and Reconstruction (SMR). SMR feeds the entire image (without any degradation) into the target model, gradually reduces the number of tokens throughout the encoder, and requires the decoder to maximally recover the original image in the semantic space using the remaining tokens. We establish SMR upon the vanilla ViT and two of its variants. Under the standard evaluation protocol, SMR shows favorable performance in visual pre-training and various downstream tasks. Additionally, SMR enjoys reduced pre-training time and memory consumption and thus is scalable to pre-train very large vision models. Code is submitted as supplementary material and will be open-sourced.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2193
Loading