The Invariance Starvation Hypothesis

16 Sept 2024 (modified: 23 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Spurious Correlations, Reasoning, Robustness
Abstract: Deep neural networks are known to learn and rely on spurious correlations during training, preventing them from being reliable and able to solve highly complex problems. While there exist many proposed solutions that overcome such reliance in different, tailored settings, current understanding regarding the formation of spurious correlations is limited. All proposed solutions with promising results assume that networks trained with empirical risk minimization will learn spurious correlations due to a preference for simpler features and that a solution to this problem requires further processing on the networks' learned representations or re-training on a modified dataset where the proportion of training data with spurious features is significantly lower. In this paper, we aim to form a better understanding regarding the formation of spurious correlations by performing a rigorous study regarding the role that data plays in the formation of spurious correlations. We show that in reasoning tasks with simple input samples, simply drawing more data from the same training distribution overcomes spurious correlations, even though we maintain the proportion of samples with spurious features. In other words, we find that if the network has enough data to encode the invariant function appropriately, it no longer relies on spurious features, regardless of its strength. We observe the same results in settings with more complex distributions with an intractable number of participating features, such as vision and language. However, we find that in such settings, drawing more samples from the training distribution while maintaining proportion can exacerbate spurious correlations at times, due to the introduction of new samples that are significantly different from samples in the original training set. Taking inspiration from reasoning tasks, we present an effective remedy to this problem to ensure that drawing more samples from the distribution always overcomes spurious correlations.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1209
Loading