Keywords: Multimodal EHR Foundation Model, Scaling Law, Curriculum Pretraining, Foundation Models for Precision Oncology
Abstract: Precision medicine requires integrating diverse data modalities such as structured electronic health records (EHRs), radiology images, digital pathology, and genomics to guide treatment decisions. Yet, many existing foundation models for longitudinal patient data remain unimodal, trained solely on structured codes or imaging modalities, which limits their clinical utility for highly complex conditions like cancer. In this work, we present the first scaling-law study for foundation models pretrained on multimodal patient journeys, using longitudinal structured records, CT scans, and whole-slide histopathology images from 2.3M cancer patients. We train our models (MEHRT) to simultaneously process all modalities recorded across time via a multi-stage curriculum pretraining strategy, and introduce a new evaluation suite of six oncology prediction tasks (e.g., progression-free survival, metastasis) carefully defined with oncologists. MEHRT consistently outperforms state-of-the-art supervised baselines, e.g., achieving a +7% average improvement in AUROC over the best-performing baseline (CatBoost), and its performance scales with model size. When compared to its unimodal counterpart (EHRT), MEHRT shows modest yet consistent improvements in predictive accuracy and generative modeling capabilities, suggesting that multimodality can complement scaling. Finally, we discuss important limitations and practical lessons learned that inform future development of multimodal EHR foundation models.
Submission Number: 92
Loading