Out-of-Distribution Generalization Analysis via Influence FunctionDownload PDF

28 Sept 2020 (modified: 22 Oct 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Abstract: The mismatch between training dataset and target environment is one major challenge for current machine learning systems. When training data is collected from multiple environments and the the evaluation is on any new environment, we are facing an Out-of-Distribution (OOD) generalization problem that aims to find a model with the best OOD accuracy, i.e. the best worst-environment accuracy. However, with limited access to environments, the worst environment may be unseen, and test accuracy is a biased estimate of OOD accuracy. In this paper, we show that test accuracy may dramatically fail to identify OOD accuracy and mislead the tuning procedure. To this end, we introduce Influence Function, a classical tool from robust statistics, into the OOD generalization problem and suggest the variance of influence function to measure the stability of a model on training environments. We show that the proposed index and test accuracy together can help us discern whether OOD algorithms are needed and whether a model achieves good OOD generalization.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/arxiv:2101.08521/code)
Reviewed Version (pdf): https://openreview.net/references/pdf?id=ymq1IMKzeK
17 Replies

Loading