Pancreatic cancer risk prediction using deep sequential modeling of longitudinal diagnostic and medication records

Published: 13 Mar 2025, Last Modified: 18 May 2025medarxivEveryoneCC BY-NC 4.0
Abstract: Background Pancreatic ductal adenocarcinoma (PDAC) is a rare, aggressive cancer often diagnosed late with low survival rates, due to the lack of population-wide screening programs and the high cost of currently available early detection methods. Methods To facilitate earlier treatment, we developed an AI-based tool that predicts the risk of pancreatic cancer diagnosis within 6, 12 and 36 months of assessment, using time sequences of diagnostic and medication events from real-world electronic health records (EHRs). Trained on a large US Veterans Affairs dataset with 19,000 PDAC cases and millions of controls, the tool employs a Transformer-based model that can capture and benefit from information synergy between diagnoses and medications. Findings Risk prediction is improved when incorporating medication data alongside diagnostic codes. For N patients predicted to be at highest risk out of 1 million, risk of cancer within 3 years is substantially higher than using a reference estimate based on age and gender alone (standard incidence ratio SIR=115 to 70 for N=1000 to 5000). Detection of the most predictive features generates clinical hypotheses such as the role of chronic inflammatory conditions in predisposing to PDAC or use of specific medication that highlight the health state of a patient and cancer risk. We quantify prediction bias between different socioeconomic subpopulations. Interpretation The risk prediction tool is intended to be the first step in a three-step clinical program: identification of high-risk individuals using AI tools, followed by a stratified surveillance program for early detection and intervention, aiming to benefit patients and lower health-care costs.
Loading