Feature Level Instance Attribution

ICLR 2025 Conference Submission1912 Authors

19 Sept 2024 (modified: 13 Oct 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Interpretability, attribution
Abstract: Instance attribution has emerged as one of the most crucial methodologies for model explainability because it identifies training data that significantly impacts model predictions, thereby optimizing model performance and enhancing transparency and trustworthiness. The applications of instance attribution include data cleaning, where it identifies and rectifies poor-quality data to improve model outcomes, and in specific domains such as detection of harmful speech, social network graph labeling, and medical image annotation, it provides precise insights into how data influences model decisions. Specifically, current instance attribution methods facilitate the identification of causal relationships between training data and model predictions. A higher Instance-level Training Data Influence value (IL value) indicates that the training data used for the computation play a more significant role in the model's prediction process. However, the current methods can only indicate that a training sample is important, but they do not explain why this sample is important. A feasible algorithm is urgently needed to provide an explanation for this behavior. This paper discovers that artificially manipulating the attribution score by modifying samples (e.g., changing a pixel value in image data) can significantly intervene in the importance of training samples and yield explainability results at the feature-level during the intervention process. The proposed Feature Level Instance Attribution (FLIA) algorithm assists in identifying crucial feature locations in training data that significantly impact causality. To avoid the frequent retraining of models for evaluation, we introduce an unlearning algorithm as an assessment method and provide detailed empirical evidence of our algorithm's efficacy. To facilitate future research, we have made the code available at: https://anonymous.4open.science/r/FIIA-D60E/.
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1912
Loading