Co-distilled attention guided masked image modeling with noisy teacher for self-supervised learning on medical images

Published: 27 Mar 2025, Last Modified: 01 May 2025MIDL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Attention-guided masked image modeling, Swin, noise regularized co-distillation
TL;DR: Attention guided transformer
Abstract: Masked image modeling (MIM) is a highly effective self-supervised learning (SSL) approach to extract useful feature representations from unannotated data. Predominantly used random masking methods make SSL less effective for medical images due to the contextual similarity of neighboring patches, leading to information leakage and SSL simplification. Hence, we propose an attention guided masking mechanism within a co-distillation learning framework to selectively mask semantically co-occurring and discriminative patches, aiming to reduce information leakage and increase the difficulty of SSL pretraining. However, attention guided masking inevitably reduces the diversity of attention heads, which negatively impacts downstream task performance. To address this, we integrate a noisy teacher into the co-distillation framework (termed DAGMaN) to enable attentive masking while preserving high attention head diversity. We demonstrate the capability of DAGMaN on multiple tasks including full- and few-shot lung nodule classification, immunotherapy outcome prediction, tumor segmentation, and unsupervised clustering of organs.
Primary Subject Area: Foundation Models
Secondary Subject Area: Application: Radiology
Paper Type: Methodological Development
Registration Requirement: Yes
Submission Number: 141
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview