ContraMAE: Contrastive alignment masked autoencoder framework for cancer survival prediction

Suixue Wang, Huiyuan Lai, Shuling Wang, Qingchen Zhang

Published: 02 Dec 2024, Last Modified: 18 Jan 2025OpenReview Archive Direct UploadEveryoneRevisionsCC BY 4.0

Abstract: With the rapid advancement in multimodal fusion technology, the integration of pathological images with genomics data has achieved promising results in cancer survival prediction. However, most existing multimodal models are not pre-trained by combining pathology and genomics modalities, ignoring the inherent task-agnostic associations between different modalities. While some self-supervised methods align multimodal information through pre-training objectives such as correlation and mean square error, they lack in-depth multimodal interaction. To address these issues, we propose \textbf{ContraMAE}, a contrastive alignment masked autoencoder framework, to fuse pathological images and genomics data for cancer survival prediction. Concretely, we introduce a contrastive objective to align multi-modality and construct their intrinsic consistency. Besides, we design two reconstruction objectives to capture the complex relationships between multi-modalities by mutually compensating for the information that each side lacks. In survival prediction, the pathology and genomics encodings from the ContraMAE encoder are concatenated as the final representation to generate a survival risk score. Experimental results demonstrate that ContraMAE outperforms existing state-of-the-art methods on five cancer datasets sourced from The Cancer Genome Atlas (TCGA).