Abstract: Survival prediction requires integrating Whole Slide Images (WSIs) and genomics, a task complicated by significant heterogeneity and complex inter- and intra-modal interactions between modalities. Previous methods used co-attention, fusing features only once after separate encoding, which is insufficient to model such a complex task due to modality heterogeneity. To this end, we propose a Biased Progressive Encoding (BPE) paradigm, performing encoding and fusion simultaneously. This paradigm uses one modality as a reference when encoding the other, fostering deep fusion of the modalities through multiple iterations, progressively reducing the cross-modal disparities and facilitating complementary interactions. Besides, survival prediction involves biomarkers from WSIs, genomics, and their integrative analysis. Key biomarkers may exist in different modalities under individual variations, necessitating the model flexibility. Hence, we further propose a Mixture of Multimodal Experts layer to dynamically select tailored experts in each stage of the BPE paradigm. Experts incorporate reference information from another modality to varying degrees, enabling a balanced or biased focus on different modalities during the encoding process. The experimental results demonstrate the superior performance of our method on various datasets, including TCGA-BLCA, TCGA-UCEC and TCGA-LUAD. Codes are available at https://github.com/BearCleverProud/MoME.
Loading