Rotated and Masked Image Modeling: A Superior Self-Supervised Method for Classification

Daisong Yan, Xun Gong, Zhemin Zhang

Published: 01 Jan 2023, Last Modified: 06 Nov 2023IEEE Signal Process. Lett. 2023Readers: Everyone

Abstract: Mask image modeling (MIM) has performed excellently as a transformer-based self-supervised method via random masking and reconstruction. However, since the unmasked image patches are non-participation in the loss computation, MIM cannot effectively utilize the data and waste much computation. This drawback usually limits the learning ability of the pre-training model when pre-training on small-scale datasets. To solve this problem, we propose a novel self-supervised learning method for small-scale datasets called RotMIM. Unlike MIM, RotMIM has a different pretext task: recognizing the rotation angle that is applied to the unmasked patches. RotMIM can fully utilize data and provide a stronger self-supervised signal. Moreover, to fit RotMIM, we propose a data augmentation method called FeaMix. Our proposal ensures that the mixing area with RotMIM understands that each basic unit of semantic information in an image has the same size. This consistency guarantees clean tokenization during fine-tuning after pre-training. Our proposals outperform state-of-the-art self-supervised methods on three popular datasets, Mini-ImageNet, Caltech256, and Cifar100.

0 Replies