Abstract: Self-supervised learning (SSL) has emerged as a promising technique for analyzing medical modalities such as X-rays due to its ability to learn without annotations. However, conventional SSL methods face challenges in achieving se-mantic alignment and capturing subtle details, which limits their ability to accurately represent the underlying anatom-ical structures and pathological features. To address these limitations, we propose OTCXR, a novel SSL framework that leverages optimal transport (OT) to learn dense seman-tic invariance. By integrating OT with our innovative Cross-Viewpoint Semantics Infusion Module (CV-SIM), OTCXR enhances the model's ability to capture not only local spa-tial features but also global contextual dependencies across different viewpoints. This approach enriches the effective-ness of SSL in the context of chest radiographs. Further-more, OTCXR incorporates variance and covariance reg-ularizations within the OT framework to prioritize clini-cally relevant information while suppressing less informa-tive features. This ensures that the learned representations are comprehensive and discriminative, particularly benefi-cial for tasks such as thoracic disease diagnosis. We vali-date OTCXR's efficacy through comprehensive experiments on three publicly available chest X-ray datasets. Our em-pirical results demonstrate the superiority of OTCXR over state-of-the-art methods across all evaluated tasks, confirming its capability to learn semantically rich representations.
Loading