Generating Personalized Imputations for Patient Health Status Prediction in Electronic Health Records
Abstract: Electronic health records (EHRs) play a crucial role in the development of personalized treatment plans for patients. However, EHRs are often highly incomplete, posing significant challenges for predictive modeling. While existing deep learning models employ various imputation techniques to reconstruct missing values, they often fail to represent missing data in a personalized manner and do not learn from the missingness patterns in EHRs data. This limitation reduces their effectiveness in practical personalized healthcare applications. To address this issue, we propose SPIME, a self-supervised model that generates personalized imputations in EHRs data for patient health status prediction. We introduce a personalized missing mask (PMM) based on the frequency of feature measurements. Additionally, we incorporate a masked imputation task (MIT) loss that minimizes the loss of artificially introduced missing values, thereby enhancing the model’s capability to handle missing data. SPIME adopts self-supervised pretraining to learn representations from personalized missing patterns and reconstructs missing data in the latent space. To further enhance representation learning, two independent attention mechanisms are applied separately across the feature and temporal dimensions. Experimental results on two real-world EHRs datasets show that SPIME outperforms existing state-of-the-art methods in predicting in-hospital mortality and decompensation, demonstrating its effectiveness in reconstructing missing data and predicting patient health status. The code will be published at https://github.com/cling6666/SPIME.
Loading