A Closer Look at Self-supervised Lightweight Vision Transformers

Shaoru Wang; Jin Gao; Zeming Li; Weiming Hu

A Closer Look at Self-supervised Lightweight Vision Transformers

Shaoru Wang, Jin Gao, Zeming Li, Weiming Hu

Published: 01 Feb 2023, Last Modified: 22 Jun 2025Submitted to ICLR 2023Readers: Everyone

Keywords: Self-supervised Learning, Vision Transformers, Lightweight Networks

Abstract: Self-supervised learning on large-scale Vision Transformers (ViTs) as pre-training methods has achieved promising downstream performance. Yet, how much these pre-training paradigms promote lightweight ViTs' performance is considerably less studied. In this work, we mainly develop and benchmark self-supervised pre-training methods, e.g., contrastive-learning-based MoCo-v3, masked-image-modeling-based MAE on image classification tasks, and some downstream dense prediction tasks. We surprisingly find that if proper pre-training is adopted, even vanilla lightweight ViTs show comparable performance on ImageNet to previous SOTA networks with delicate architecture design. We also point out some defects of such pre-training, \eg, failing to benefit from large-scale pre-training data and showing inferior performance on data-insufficient downstream tasks. Furthermore, we analyze and clearly show the effect of such pre-training by analyzing the properties of the layer representation and attention maps for related models. Finally, based on the above analyses, a distillation strategy during pre-training is developed, which leads to further downstream performance improvement for MAE-based pre-training.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Unsupervised and Self-supervised learning

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/a-closer-look-at-self-supervised-lightweight/code)

11 Replies

Loading