Time Conditioned Foreseeing: Temporal Generative Pretraining for EHR foundation models

Time Conditioned Foreseeing: Temporal Generative Pretraining for EHR foundation models

ICLR 2026 Conference Submission20012 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Electronic Health Record, Temporal modeling, Generative pre-training, Irregularly sampled timestamps

TL;DR: We propose a clinically aligned value tokenization and time representation technique, together with a temporal generative pre-training objective, for learning EHRs consisting of tokens with irregular timestamps.

Abstract: Electronic Health Records (EHRs) possess unique characteristics that differ significantly from natural language. However, existing models have overlooked these properties and largely relied on Natural Language Processing (NLP) approaches, resulting in suboptimal performance. To address these limitations, we propose a pretraining method designed to effectively capture the distinctive features of EHRs. First, EHRs contain both clinically critical and less informative numerical ranges. To reflect this, we introduce a Pathology-Focused Binning strategy that emphasizes values with clinical significance. Second, both absolute timestamps and relative time intervals are important in EHRs. To incorporate these temporal aspects, we propose a Dual-Calendar Rotary Positional Embedding (RoPE) that jointly encodes complementary temporal signals. Third, many medical applications require modeling long-term patient interactions. Accordingly, we extend conventional next-token prediction with a Time-Conditioned Foreseeing (TCF) objective, enabling the model to forecast long-range clinical events across multiple temporal horizons. Our approach establishes the first genuine temporal generative EHR model, advancing long-range clinical forecasting. It outperforms existing EHR foundation models on seven diverse downstream tasks and enables realistic and temporally consistent EHR generation. All code and models will be made publicly available in the final version of the manuscript.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 20012

Loading