SDDA-MAE: Self-distillation enhanced Dual Attention Masked Autoencoder for Small-scale Medical Image Datasets

Published: 27 Apr 2024, Last Modified: 26 May 2024MIDL 2024 Short PapersEveryoneRevisionsBibTeXCC BY 4.0
Keywords: MAE, Self-Distillation, Transformer, Pre-training, Small-scale Datasets
Abstract: Masked Autoencoder (MAE) has shown promise as a self-supervised learning method in natural images. However, its application in medical imaging is limited by data scarcity. To alleviate this challenge, we propose SDDA-MAE, a method for direct pre-training and fine-tuning on targeted datasets without the requirement of self-supervised pre-training on an extra large dataset. The Dual Attention Transformer (DAT) serves as the backbone for enhanced spatial and channel-wise image representation. During the pre-training stage, we employ Self-distillation (SD) to transfer knowledge from the decoder, containing global information, to the encoder, which holds local information, improving weight initialization for downstream tasks. Experimental results demonstrate our method outperforms numerous self-supervised and supervised state-of-the-art (SOTA) methods in tasks like medical image segmentation and classification, even without pre-training on larger upstream datasets.
Submission Number: 108
Loading