PRISM: Mitigating EHR Data Sparsity via Learning from Missing Feature Calibrated Prototype Patient Representations

Yinghao Zhu, Zixiang Wang, Long He, Shiyun Xie, Xiaochen Zheng, Liantao Ma, Chengwei Pan

Published: 01 Jan 2024, Last Modified: 02 Aug 2025CIKM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Electronic Health Records (EHRs) provide valuable patient data but often suffer from sparsity issue, posing significant challenges in predictive modeling. Conventional imputation methods inadequately distinguish between real and imputed data, leading to potential inaccuracies of patient representations. To address these issues, we introduce PRISM, a framework that indirectly imputes data through prototype representations of similar patients, thus ensuring denser and more accurate embeddings. PRISM also includes a feature confidence learner module, which evaluates the reliability of each feature considering missing statuses. Additionally, it incorporates a new patient similarity metric that accounts for feature confidence, avoiding overreliance on imprecise imputed values. Our extensive experiments on the MIMIC-III, MIMIC-IV, PhysioNet Challenge 2012, eICU datasets demonstrate PRISM's superior performance in predicting in-hospital mortality and 30-day readmission tasks, showcasing its effectiveness in handling EHR data sparsity. For the sake of reproducibility and further research, we have publicly released the code at https://github.com/yhzhu99/PRISM.