Abstract: Self-supervised learning (SSL) has gained traction in medical image analysis, enabling representation learning with limited labels. While contrastive SSL, using diverse augmentations, has become dominant, we argue that applying standard augmentations, originally designed for natural images, to medical images like chest X-rays is suboptimal. Chest X-rays possess unique structures and subtle abnormalities that differ from natural images, and preserving these during augmentation is critical for learning clinically meaningful representations. In this paper, we introduce a novel set of domain-specific contextual transformations tailored for chest X-rays, including anatomy-aware perturbations and context random masking, designed to preserve diagnostic semantics during SSL pre-training. This augmentation strategy is the first to explicitly align transformation design with radiological context, addressing a major gap in existing medical SSL approaches. Experiments on NIH, RSNA, and SIIM datasets show that our approach yields up to 5% improvement in downstream tasks under limited supervision, compared to standard augmentations.
External IDs:doi:10.1109/lsp.2025.3585005
Loading