SparseEHR: Scalable Foundation Modeling for Structured EHR via Conditional Computation

Meysam Ghaffari; Animesh Agarwal; Nina Fatehi; Lambert Leong; Thomas Linden; Reihaneh Hassanzadeh; Carlos Morato

SparseEHR: Scalable Foundation Modeling for Structured EHR via Conditional Computation

Meysam Ghaffari, Animesh Agarwal, Nina Fatehi, Lambert Leong, Thomas Linden, Reihaneh Hassanzadeh, Carlos Morato

Published: 23 May 2026, Last Modified: 13 Jun 2026SD4H ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Electronic Health Records (EHR), Structured Healthcare Data, Clinical Prediction, Longitudinal Patient Modeling, Mixture-of-Experts, Scalable Machine Learning

TL;DR: A hybrid dense–Mixture-of-Experts transformer pretrained on 50M EHRs improves zero-shot clinical prediction across health systems while reducing per-token compute.

Abstract: Structured electronic health records (EHRs) are a natural substrate for healthcare foundation models, but dense transformers remain expensive to scale across heterogeneous code vocabularies, irregular longitudinal records, and very large patient populations. We present SparseEHR, a hybrid dense-to-sparse transformer for structured EHR sequences that uses dense warm-start layers followed by mixture-of-experts (MoE) layers with top-2 routing and a shared expert pathway. SparseEHR is pretrained on longitudinal diagnosis and procedure sequences from approximately 50 million de-identified individuals in the OptumLabs Data Warehouse. In strictly zero-shot transfer to MIMIC-IV, without any fine-tuning, SparseEHR achieves 0.463 Recall@10 and 0.551 Recall@20 for next-visit ICD-10 prediction, out performing recent public baselines. The selected hybrid configuration also reduces active parameters per token from 530M to 470M and training step time from 1.889s to 1.682s relative to an all-MoE variant, showing that conditional computation can improve transfer while lowering per-token compute for structured health data.

Submission Number: 34

Loading