Abstract: A major challenge in medical imaging is the limited availability of large, well-annotated datasets. In the case of breast cancer detection from mammograms (BCDM), obtaining precise bounding box annotations for regions of interest is costly and labor-intensive. However, large collections of unannotated mammograms are often readily available. Motivated by this observation, we propose a self-supervised fine-tuning framework for BCDM . Traditional object detection models, designed for natural images with abundant object presence, struggle in medical imaging due to limited annotated data. To tackle this challenge, we introduce MedMask, a novel self-supervised framework that leverages masked autoencoders (MAE) with vision foundation models (VFMs) in a transformer-based architecture. We propose a customized MAE module that utilizes the transformer’s encoder and an auxiliary decoder to mask and reconstruct multi-scale feature maps, enabling efficient learning from limited annotations while capturing domain-specific features. Additionally, we leverage the zero-shot capabilities of VFMs with a proposed expert contrastive knowledge distillation technique to learn better representations. Our approach outperforms the state-of-the-art on the publicly available INBreast and DDSM datasets, achieving significant sensitivity improvements of 22% and 17%, respectively. Also we achieved 27% improvement on RSNA-BSD1K dataset. Code is available at https://github.com/Tajamul21/MedMask.
External IDs:doi:10.1007/978-3-032-05559-0_34
Loading