Out-of-Distribution Generalization with Maximal Invariant PredictorDownload PDF

28 Sept 2020 (modified: 22 Oct 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone
Keywords: out-of-distribution generalization, extrapolation
Abstract: Out-of-Distribution (OOD) generalization is a problem of seeking the predictor function whose performance in the worst environment is optimal. This paper makes both theoretical and algorithmic contributions to the OOD problem. We consider a set of all invariant features conditioned to which the target variable and the environment variable becomes independent, and theoretically prove that one can seek an OOD optimal predictor by looking for the mutual-information maximizing feature amongst the invariant features. We establish this result as \textit{Maximal Invariant Predictor condition}. Our theoretical work is closely related to approaches like Invariant Risk Minimization and Invariant Rationalization. We also derive from our theory the \textit{Inter Gradient Alignment}(IGA) algorithm that uses a parametrization trick to conduct \textit{feature searching} and \textit{predictor training} at once. We develop an extension of the Colored-MNIST that can more accurately represent the pathological OOD situation than the original version, and demonstrate the superiority of IGA over previous methods on both the original and the extended version of Colored-MNIST.
One-sentence Summary: We formalize an information theoretic condition under which an invariant feature can be used for OOD generalization problem, and propose a novel algorithm to seek an OOD optimal predictor
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/arxiv:2008.01883/code)
Reviewed Version (pdf): https://openreview.net/references/pdf?id=eDicni6REL
7 Replies

Loading