Foresight-England: Development of a National-Scale Generative AI Model of Patient Electronic Health Records for General Medical Event Prediction across the COVID-19 Pandemic

Published: 12 Oct 2025, Last Modified: 13 Oct 2025GenAI4Health 2025 PosteroptoutofpublicavailabilityEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Generative AI, Electronic Health Records, Clinical Prediction, COVID-19, National-Scale Health Data, Zero-Shot, Foundation Models
TL;DR: We developed Foresight-England, the first national-scale generative AI healthcare model, trained on 54M UK electronic health records to enable zero-shot prediction of the direct and indirect effects of the COVID-19 pandemic.
Abstract: We developed Foresight-England (Foresight-E), the first national-scale generative foundation model of electronic health records (EHRs), to support COVID-19-related research. We evaluated its ability to model the direct and indirect effects of the COVID-19 pandemic. The 243M-parameter transformer decoder was trained from scratch using a cohort of 54.9 million routinely collected, de-identified, longitudinal EHRs, including primary and secondary care, national death registrations, and COVID-19 testing/vaccination data. Foresight-E models patient timelines autoregressively to enable zero-shot generative prediction across its vocabulary of $\sim$40,000 coded medical events. Our tokenisation scheme preserves the recorded clinical granularity of ICD-10, OPCS-4, and SNOMED CT codes, while jointly encoding absolute and relative temporal context. We designed and implemented an evaluation framework spanning 30-day COVID-19 hospitalisation and mortality using Brier scores and the area under the receiver operating characteristic (AUROC) and precision–recall (AUPRC) curves.. We further evaluated the ability to model the pandemic’s indirect effects by testing temporal generalisation on the held-out year of 2023, simulating prospective deployment. We benchmarked model performance against logistic regression and XGBoost baselines using a test set of 6.1 million patients. Following concerns raised by the British Medical Association and Royal College of General Practitioners’ Joint GP IT Committee, NHS England has paused access to data for the Foresight project while a review is carried out. That pause means quantitative results are not available pending the outcome of ongoing discussions. Instead, we share our strategy for tokenisation, model architecture, training, inference, and evaluation, as a methodological template and a case study in the challenges of building population-scale, EHR foundation models and operationalising generative AI for national health systems.
Submission Number: 60
Loading