Information Theoretic Approaches for Testing Missingness in Predictive ModelsDownload PDF

Published: 06 Jul 2020, Last Modified: 05 May 2023ICML Artemiss 2020Readers: Everyone
TL;DR: mutual information based hypothesis testing for missingness
Keywords: mutual information, missingness, hypothesis test
Abstract: In predictive modeling, missing data can often result in learning biased models despite application of imputation approaches. Therefore, it is important to assess the missingness process of the data. We present hypothesis tests for assessing these dependencies: MI-MCAR (mutual information for missing completely at random) and MI-US (mutual information for unobserved sources). MI-MCAR tests marginal independence between the missingness pattern and the the data matrix, while MI-US is a conditional randomization test (CRT) to test the dependence of the missingness pattern on unobserved sources. These methods can be applied to heterogeneous data types and can serve to identify missingness pathologies in data which specifically affect performance for regression tasks. We evaluate our methods on simulated and pseudo-simulated datasets and show that we are able to identify data which suffers from missingness due to unobserved sources.
3 Replies

Loading