Keywords: self-supervised learning, medical machine learning, multimodal learning, distillation, chest x-ray
TL;DR: PaCX-MAE is a cross-modal distillation framework that injects physiological priors into chest X-ray (CXR) encoders while remaining strictly unimodal at inference.
Abstract: Clinical diagnosis often requires combining imaging with physiological measurements, yet deployed models typically operate on unimodal data. We present $\textbf{PaCX-MAE}$, a cross-modal distillation framework that injects physiological priors into chest X-ray (CXR) encoders while remaining strictly unimodal at inference. PaCX-MAE augments in-domain masked autoencoding with a dual contrastive-predictive objective, aligning CXR representations with paired ECG and laboratory embeddings. Extensive evaluation across nine benchmarks demonstrates consistent improvements over domain-specific MAE, particularly on physiology-dependent tasks (e.g., $\textbf{+2.7 AUROC}$ on MedMod; $\textbf{+6.5 F1}$ on VinDr). The method proves highly label-efficient in the $\textbf{1}$% regime and preserves anatomical fidelity, achieving parity with MAE on segmentation tasks. Zero-shot and attention analyses confirm that PaCX-MAE successfully learns to attend to physiological indicators, such as the cardiac silhouette, absent in standard visual pretraining.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 35
Loading