Keywords: Survival analysis, Multimodal learning, Learnable queries
Abstract: Leveraging multimodal data, particularly the integration of whole-slide histology images (WSIs) and transcriptomic profiles, holds great promise for improving cancer survival prediction. However, excessive redundancy in multimodal data poses a critical challenge for model optimization and can become prohibitive. Thus, methods that effectively reduce redundancy are highly desirable. While previous approaches have achieved impressive results by clustering redundant representations, they still rely on additional prior knowledge, which limits their flexibility in capturing dynamic data changes and emerging patterns. To resolve this drawback, we propose a novel and effective approach, SurvQ, for multimodal cancer survival analysis with learnable queries, which adaptively learns representative features in a data-driven manner, reducing redundancy while preserving critical information. Our method employs two sets of learnable query vectors that serve as a bridge between high-dimensional representations and survival prediction, capturing task-relevant features. Additionally, we introduce a multimodal mixed self-attention mechanism to enable cross-modal interactions, further enhancing information fusion. Extensive experiments on five benchmark cancer datasets demonstrate that our method consistently outperforms state-of-the-art approaches, achieving the best average performance.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 7352
Loading