Algorithm for Concept Extrapolation: Diverse Generalization via Selective Disagreement

Oliver Daniels; Stuart Armstrong; Alexandre Maranhão; Rebecca Gorman; Mahirah Fairuz Rahman

Algorithm for Concept Extrapolation: Diverse Generalization via Selective Disagreement

Oliver Daniels, Stuart Armstrong, Alexandre Maranhão, Rebecca Gorman, Mahirah Fairuz Rahman

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: domain adaptation, spurious correlations, simplicity bias, diverse ensembles

TL;DR: We learn classifiers that disagree on spurious-correlate breaking instances using an exponentially weighted regularization loss

Abstract: Standard deep learning approaches often struggle to handle out-of-distribution data, especially when the distributional shift breaks spurious correlations. While some approaches to handling spurious correlations under distributional shift aim to separate causal and spurious features without access to target distribution data, they rely on labeled data from different domains or contingent assumptions about the nature of neural representations. Existing methods that do make use of unlabeled target data make strict assumptions about the target data distribution. To overcome these limitations, we present the Algorithm for Concept Extrapolation (ACE). Using an exponentially-weighted disagreement loss to maximize disagreement on target instances \textit{that break spurious correlations}, ACE achieves state of the art performance on spurious complete correlation benchmarks. We also show ACE is robust to unlabeled target distributions where spurious and ground truth features are not statistically independent. Finally, we demonstrate the applicability of ACE for handling goal-misgeneralization in deep reinforcement learning, with our ``ACE agent'' achieving a 16% higher level completion rate in the CoinRun goal misgeneralisation problem when the coin is randomly placed in the level.

Primary Area: transfer learning, meta learning, and lifelong learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 11636

Loading