Keywords: foundation models, electronic health records, marked time-to-event, pretraining loss
TL;DR: We present a marked time-to-event pretraining objective for EHR foundation models, enabling robust transfer across datasets, downstream tasks, and model architectures.
Abstract: Clinical events captured in Electronic Health Records (EHRs) are irregularly sampled and may consist of a mixture of discrete events and numerical measurements, such as laboratory values or treatment dosages. The sequential nature of EHR, analogous to natural language, has motivated the use of next-token prediction to train prior EHR Foundation Models (FMs) over events. However, this pre-training fails to capture the full structure of EHR. We propose ORA, a marked time-to-event pretraining objective that jointly models event timing and associated measurements. Across multiple datasets, downstream tasks, and model architectures, this objective consistently yields more generalizable representations than existing pretraining losses. Importantly, the proposed objective yields improvements beyond traditional classification evaluation, including better regression and time-to-event prediction. Beyond introducing a new FM, our results suggest a broader takeaway: pretraining objectives that account for all EHR dimensions are critical for expanding downstream capabilities and generalizability.
Submission Number: 63
Loading