Lung Cancer Risk Prediction Model Trained with Multi-source Data

Shijie Sun, Hanyue Liu, Ye Wang, Hong Yu

Published: 2024, Last Modified: 11 May 2026IJCRS (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recent research about lung cancer risk prediction model require the data for predicting as same as the data for training whether based on single-source data or multi-source data. Both of them either cannot fully use collected multi-source data to train model or need higher data cost to predict. If the model is trained by gathered multi-source data, but still make prediction by single-source data, the cost of the patients will be avoid increasing. In this work, the cross-modal knowledge distillation technique is introduced to train the lung cancer risk prediction model for the purpose. However, present cross-modal knowledge distillation techniques are incapable of dealing with different biases in data sources. To solve this problem, the model performs features extraction on a sample from multiple perspectives. For validating the efficacy, the proposed model is evaluated with eight baselines on the NLST dataset, which includes CT image data as well as questionnaire data. In terms of AUC, the results demonstrate that the proposed model outperforms the vanilla MLP by 10.88% and the best baseline by 2.71%. The proposed model may effectively exploit history data, ensuring not only the accuracy of prediction but also lowering the user’s expenditure for data.

External IDs:dblp:conf/rskt/SunLWY24