PRIME: Pretraining for Patient Condition Representation with Irregular Multimodal Electronic Health Records

Published: 01 Jan 2025, Last Modified: 31 Oct 2025ACM Trans. Knowl. Discov. Data 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With the increasing collection of electronic health records (EHRs), deep learning has become a crucial tool for real-time treatment analysis. However, due to patient privacy concerns, the scarcity of labeled data limits the end-to-end models that rely on large training data. Self-supervised pretraining offers a promising solution. Nevertheless, applying pretraining to EHRs faces two key issues: (1) EHRs exhibit multimodality, including monitoring data and recorded clinical note. For multimodal pretraining, designing a self-supervised task that can establish cross-modal associations while preserving all modal-unique information remains challenging. (2) Both modalities are sequential and irregular, with varying intervals between monitoring or records. Aligning monitoring times with recorded times poses a significant issue for fine-grained cross-modal pretraining. Existing pretraining models either focus on a single modality or only models regular data, failing to address them together. To fill this gap and fully utilize unlabel EHR data, we propose a pretraining model to learn patient representation using unlabel irregular multimodal EHRs, named PRIME. We first utilize a multi-element encoding module to extract patient condition snapshots from both modalities. Then, to construct multiple aligned cross-modal positive sample pairs that span the entire treatment process from irregular data, we employ patient condition alignment modules that integrate time-aware and feature-aware components to transfer snapshots to the aligned timestamps. Next, to preserve both shared and unique information of each modality, our decoupled representation learning strategy first uses a constraint matrix to separate shared information. We then employ contrastive-based cross-modal learning and reconstruction-based intra-modal learning to model shared and complete information, respectively. Extensive experiments on two real-world tasks demonstrate the superiority of PRIME over the state-of-the-art models, especially with limited labels.
Loading