Holmex: Human-Guided Spurious Correlation Detection and Black-box Model Fixing

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Interpretability, Model Editing, Concept Bottleneck Model, Human-AI Interaction
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: human-guided spurious correlation detection and black-box model fixing by white-box models.
Abstract: We propose Holmex, a method for human-guided spurious correlation detection and black-box model fixing. \ours{} provides a way for humans to be easily involved in the deep model debugging process, which includes 1) detecting conceptual spurious correlation in training data and 2) fixing biased black-box models by white-box models. In the first step, we leverage pre-trained vision-language model to construct separable vectors for some high-level and meaningful concepts, and we further propose a novel algorithm based on concept vectors that is more stable than previous methods. In the second step, unlike previous works, we do not constrain the original biased model to be interpretable and editable. Instead, \ours{} is compatible with arbitrary black-box models. To this end, we propose transfer editing, a novel technique that can transfer the revision in interpretable models to the black-box models to correct their spurious correlations. Extensive experiments on multiple real-world datasets demonstrate the effectiveness of \ours{} in detecting and fixing spurious correlations. The source code and datasets can be found in https://anonymous.4open.science/r/Holmex-15DF.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4585
Loading