Robust prediction under missingness shifts

15 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: supervised learning, missing data, distribution shift
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Prediction becomes more challenging with missing data. What method is chosen to handle missing data can greatly affect how models perform. In many real-world problems, the best prediction performance is achieved by models that also leverage the informative nature of a value being missing. However, the reasons why data goes missing can change once a model is deployed in practice. In this case, prediction performance in the development data may no longer be a good selection criterion, and approaches that do not rely on informative missingness may be preferable. To identify the conditions that lead to robust prediction, we formalise the problem of missingness shifts as any change in the conditional probability of a value being missing. We then show that the optimal predictor is only affected by non-ignorable shifts, where the probability of missingness depends on unobserved data. When the optimal predictor is changed due to a non-ignorable shift, we find empirically that even predictors which utilise information encoded in the missingness may still achieve robust predictions, although different methods appear robust to different types of shifts. Disregarding informative missingness was most beneficial when the probability of missingness was influenced by the outcome.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 202
Loading