- TL;DR: mutual information based hypothesis testing for missingness
- Keywords: mutual information, missingness, hypothesis test
- Abstract: In predictive modeling, missing data can often result in learning biased models despite application of imputation approaches. Therefore, it is important to assess the missingness process of the data. We present hypothesis tests for assessing these dependencies: MI-MCAR (mutual information for missing completely at random) and MI-US (mutual information for unobserved sources). MI-MCAR tests marginal independence between the missingness pattern and the the data matrix, while MI-US is a conditional randomization test (CRT) to test the dependence of the missingness pattern on unobserved sources. These methods can be applied to heterogeneous data types and can serve to identify missingness pathologies in data which specifically affect performance for regression tasks. We evaluate our methods on simulated and pseudo-simulated datasets and show that we are able to identify data which suffers from missingness due to unobserved sources.