Identifying Spurious Correlations using Counterfactual Alignment

Joseph Paul Cohen; Louis Blankemeier; Akshay S Chaudhari

Identifying Spurious Correlations using Counterfactual Alignment

Joseph Paul Cohen, Louis Blankemeier, Akshay S Chaudhari

Published: 22 Jan 2025, Last Modified: 22 Jan 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Models driven by spurious correlations often yield poor generalization performance. We propose the counterfactual (CF) alignment method to detect and quantify spurious correlations of black box classifiers. Our methodology is based on counterfactual images generated with respect to one classifier being input into other classifiers to see if they also induce changes in the outputs of these classifiers. The relationship between these responses can be quantified and used to identify specific instances where a spurious correlation exists. This is validated by observing intuitive trends in face-attribute and waterbird classifiers, as well as by fabricating spurious correlations and detecting their presence, both visually and quantitatively. Furthermore, utilizing the CF alignment method, we demonstrate that we can evaluate robust optimization methods (GroupDRO, JTT, and FLAC) by detecting a reduction in spurious correlations.

Submission Length: Regular submission (no more than 12 pages of main content)

Video: https://www.youtube.com/watch?v=Qdz3woTLCF4

Code: https://github.com/ieee8023/latentshift

Assigned Action Editor: ~Dmitry_Kangin1

Submission Number: 3416

Loading