HeLoM: Progressive Disease Detection with Heterogeneous and Longitudinal EHRs via Memory-Augmented LLMs

HeLoM: Progressive Disease Detection with Heterogeneous and Longitudinal EHRs via Memory-Augmented LLMs

ICLR 2026 Conference Submission15732 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Disease Prediction; Heterogeneous EHR; Longitudinal EHR; LLMs

TL;DR: We propose HeLoM, a memory-augmented framework that retrieves longitudinal EHR history, integrates vital signs, and imputes missing data with LLM generation and interpolation, yielding superior Type-2 diabetes prediction.

Abstract: Recent developments in large language models (LLMs), have significantly advanced healthcare applications, especially the electronic health record (EHR) processing, and demonstrated great potential in disease prediction. EHR are digital records of patients’ medical data, including historical visits, diagnoses, lab tests, and treatments, organized across hospital visits for clinical and research use. Despite LLMs' great potentials, previous methods to predict disease with EHRs based on LLMs face several persistent challenges: (1) they often concatenate short and fixed number of EHR visits (e.g., the latest five) from individual patients and then feed it to LLMs due to either limited input context length or LLMs' capabilities to understand long context, which limits the disease prediction with longitudinal EHR; (2) most prior work focuses on clinical note and overlook EHR's inherent nature like heterogeneity; and (3) EHR are characterized by heterogeneous patterns of missingness (e.g., the missingness of various vital signs). To tackle these problems, we propose a novel progressive memory-augmented framework HeLoM that consists of three key steps: For the first challenge, in a current EHR visit, HeLoM first adaptively fetches previously refined memory (i.e., the patient's previous visits) most relevant to the current disease prediction and then refine this visit to update its memory bank. For the second challenge, we incorporate the heterogeneous data, vital signs, from EHR to enhance the prediction performance. For the third challenge, we introduce two imputation strategies to handle missing data: one leverages LLMs to generate plausible values, and the other applies linear interpolation algorithms to estimate the missing value. By collecting a real-world longitudinal EHR data on Type-2 diabetes from the hospital of our institution, we show the superior performance of HeLoM in disease prediction in terms of both prediction accuracy and early detection. Comprehensive ablation studies underscore the importance of generating missing values from heterogeneous sources, and provide insights into building reliable systems for real-world EHRs.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 15732

Loading