Keywords: Data Attribution, Debiasing
TL;DR: In this paper, we debias classifiers by leveraging data attribution techniques to isolate specific examples that disproportionally drive reliance on the spurious correlation.
Abstract: Spurious correlations in the training data can cause serious problems for machine learning deployment. However, common debiasing approaches which intervene on the training procedure (e.g., by adjusting the loss) can be especially sensitive to regularization and hyperparameter selection. In this paper, we advocate for a data-based perspective on model debiasing by directly targeting the root causes of the bias within the training data itself. Specifically, we leverage data attribution techniques to isolate specific examples that disproportionally drive reliance on the spurious correlation. We find that removing these training examples can efficiently debias the final classifier. Moreover, our method requires no additional hyperparameters, and does not require group annotations for the training data.
Submission Number: 88
Loading