Abstract: The main challenge in applying data analysis and constructing prediction models to electronic health records (EHR) lies in the substantial missing rate in such data. Some commonly employed missing data handling approaches include mean-or median-filling, carrying forward, hot-deck, resampling [1], multiple imputation [2], etc. Recently, Yuan et al. proposed a 3-dimensional multiple imputation with chained equations (3D-MICE) to conduct imputation for missing clinical time series data [3]. Lipton et al. demonstrated missing indicators [4] to be highly effective for handling missingness in temporal data [5]. Kim and Chi proposed a bio-inspired approach called Temporal Belief Memory for handling the missingness in irregular sequential data [6]. Some other approaches focused on reconstructing missing entries utilizing latent patterns extracted from the original data, such as EM imputation [7], Autoencoder [8], and matrix decomposition based imputation methods [9], [10], including SVDImpute [11], softImpute [12], and two patient record densifers, i.e., Individual Basis Approach (IBA) and Shared Basis Approach (SBA) [13]. Based on the IBA and SBA, Yang et al. proposed a subgroup basis approach (TGBA-F) for patients partitioned by age and gender [14]. To take advantage of the factorized latent patterns from data to handle the missingness in MIMIC-III, in this work, we explored two matrix decomposition based patient record den-sifiers, i.e., IBA and SBA, which made the hypothesis that the latent patterns exist either heterogeneously or homogeneously among the patients, and we compared the performance against various baselines. Since patients’ demographic information is not accessible in our dataset, the TGBA-F with age and gender subgroup basis analysis would not be applicable. The results showed that the IBA achieved the best performance, and IBA had better learning efficiency compared with the 3D-MICE.
0 Replies
Loading