Multimodal learning for non-small cell lung cancer prognosis

Yujiao Wu, Yaxiong Wang, Xiaoshui Huang, Haofei Wang, Fan Yang, Wenwen Sun, Steven W. Su, Sai Ho Ling

Published: 01 Jan 2025, Last Modified: 11 Apr 2025Biomed. Signal Process. Control. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper focuses on the task of survival time analysis for lung cancer. Despite significant progress in recent years, the performance of existing methods is still far from satisfactory. Traditional and some deep learning-based approaches for lung cancer survival time analysis primarily rely on textual clinical information such as staging, age, and histology, etc. Unlike these existing methods that predicting on the single modality, we observe that human clinicians usually consider multimodal data, such as textual clinical parameters and visual scans when estimating survival time. Motivated by this observation, we propose Lite-ProSENet, a smart cross-modality network for survival analysis that simulates human decision-making. Specifically, Lite-ProSENet adopts a two-tower architecture that takes the clinical parameters and the CT scans as inputs to produce survival prediction. The textural tower is responsible for modeling the clinical parameters. We build a light transformer using multi-head self-attention as our textural tower. The visual tower, ProSENet, is designed to extract features from CT scans. The backbone of ProSENet is a 3D ResNet that works together with several repeatable building blocks named 3D-SE Resblocks for compact feature extraction. Our 3D-SE Resblock is composed of a 3D channel “Squeeze-and-Excitation” (SE) block and a temporal SE block. The purpose of 3D-SE Resblock is to adaptively select valuable features from CT scans. Besides, to further filter out the redundant information in the CT scans, we developed a simple yet effective frame difference mechanism, which boost the performance of our model to achieve new state- of-the-art results. Extensive experiments were conducted using data from 422 NSCLC patients from The Cancer Imaging Archive (TCIA). The results show that our Lite-ProSENet outperforms favorably against all comparison methods and achieves a new state-of-the-art concordance score of 89.3%. Our code is available at: https://github.com/wangyxxjtu/Lite_ProTrans.