FINDING AND FIXING SPURIOUS PATTERNS WITH EXPLANATIONSDownload PDF

29 Sept 2021 (modified: 22 Oct 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: explainability, interpretability, debugging, spurious patterns, spurious correlations, image classification
Abstract: Machine learning models often use spurious patterns such as "relying on the presence of a person to detect a tennis racket," which do not generalize. In this work, we present an end-to-end pipeline for identifying and mitigating spurious patterns for image classifiers. We start by finding patterns such as "the model's prediction for tennis racket changes 63% of the time if we hide the people." Then, if a pattern is spurious, we mitigate it via a novel form of data augmentation. We demonstrate that this approach identifies a diverse set of spurious patterns and that it mitigates them by producing a model that is both more accurate on a distribution where the spurious pattern is not helpful and more robust to distribution shift.
Supplementary Material: zip
One-sentence Summary: We use counterfactual images to find (via explanations) and fix (via data augmentation) spurious patterns in image classifiers.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2106.02112/code)
6 Replies

Loading