Post-Training Recovery from Injected Bias with Self-Influence

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Deep learning, dataset bias, debiasing
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Learning generalized models from biased data with strong spurious correlations to the class label is an important undertaking toward fairness in deep learning. In the absence of any prior knowledge or supervision of bias, recent studies tackle the problem by presuming the bias severity to be sufficiently high and employing a bias-amplified model trained by empirical risk minimization (ERM) to identify and utilize bias-conflicting samples that are free of spurious correlations. However, insufficient preciseness in detecting bias-conflicting samples results in injecting erroneous signals during training; conversely, it leads to learning malignant biases instead of excluding them. In practice, as the presumption about the magnitude of bias often does not hold, it is important for the model to demonstrate robust performance across a wide spectrum of biases. In this paper, we propose SePT (Self-influence-based Post-Training), a fine-tuning framework leveraging the self-influence score to filter bias-conflicting samples, which yields a pivotal subset with significantly diminished spurious correlations. Our method enables the quick recovery of a biased model from learned bias through fine-tuning with minimal friction. In addition, SePT also utilizes the remaining training dataset to adjust the model, thereby maintaining robust performance in situations with weak spurious correlation or even in the absence of it. Experiments on diverse benchmark datasets with a wide range of bias strengths show that SePT is capable of boosting the performance of both bias-injected and state-of-the-art debiased models.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9048
Loading