Keywords: ML for healthcare, Cancer risk prediction, multi-cancer risk stratification, electronic health records (EHRs)
Abstract: Early detection of cancers would be of substantial benefit as many cancers are diagnosed too late. Risk assessment from electronic health records (EHRs) can be used to implement efficient surveillance programs, focusing follow-up care on those patients most likely to benefit from screening and timely intervention. We present an end-to-end, multi-task transformer that predicts risk in discrete-time intervals for multiple cancer types from longitudinal EHR trajectories. The model learns latent representations that reflect shared and cancer-specific features to improve performance across different cancer types. Training on a large EHR dataset from the US Department of Veterans Affairs (US-VA), we evaluate model performance for five cancers using positive predictive value (PPV@N) and standardized incidence ratio (SIR@N) for N high-risk patients, as well as AUPRC, under both no-exclusion and 3-month data exclusion windows. The results show improved performance in high-risk cohorts indicating the model's potential utility as a clinical decision support tool for targeted surveillance of patients.
Submission Number: 68
Loading