Feature Accompaniment: Is It Feasible to Learn Out-of-Distribution Generalizable Representations with In-Distribution Data?

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: out-of-distribution generalization, representation learning, neural networks
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We empirically show that learning representations that generalize out-of-distribution (OOD) is hard even with "oracle" repesentation learning objectives and unveil a new OOD failure mode termed feature accompaniment.
Abstract: Learning representations that generalize out-of-distribution (OOD) is critical for machine learning models to be deployed in the real world. However, despite the significant effort in the last decade, algorithmic advances in this direction have been limited. In this work, we seek to answer the fundamental question: is learning OOD generalizable representations with only in-distribution data really feasible? We first empirically show that perhaps surprisingly, even with an "oracle'' representation learning objective that allows the model to explicitly fit good representations on the training set, the learned model still underperforms OOD in a wide range of distribution shift benchmarks. To explain the gap, we then formally study the OOD generalization of two-layer ReLU networks trained by stochastic gradient descent (SGD) in a structured setting, unveiling an unexplored OOD generalization failure mode that we refer to as feature accompaniment. We show that this failure mode essentially stems from the inductive biases of non-linear neural networks and fundamentally differs from the prevailing narrative of spurious correlations. Overall, our results imply that it may be generally not feasible to learn OOD generalizable representations without explicitly considering the inductive biases of SGD-trained neural networks and provide new insights into the OOD generalization failure, suggesting that OOD generalization in practice may behave very differently from existing theoretical models.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1827
Loading