A Backdoor-based Explainable AI Benchmark for Improved Fidelity in Evaluating Attribution Methods

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeX
Keywords: Feature Attribution, Explainable AI Benchmark, Backdoor
TL;DR: A backdoor-based explainable AI benchmark is proposed for improved fidelity in evaluating attribution methods.
Abstract: Attribution methods compute importance scores for input features to explain the output predictions of deep models. However, accurate assessment of the performance of attribution methods is challenged by the lack of ground truth along with other confounding factors such as attribution post-processing and explanation objectives. In this paper, we first identify a set of fidelity criteria that must be satisfied for reliable evaluation of attribution methods. Then, we introduce a Trojaned model based benchmarking framework that adheres to the desired fidelity criteria. We theoretically establish the superiority of our approach over existing benchmarks for well-founded attribution evaluation. With extensive analysis, we also identify a setup for a consistent and fair benchmarking of attribution methods across different underlying methodologies. This setup is ultimately employed for a comprehensive comparison of existing methods using our benchmark. Finally, our analysis also provides guidance for defending against backdoor attacks using existing attribution methods.
Supplementary Material: zip
Primary Area: visualization or interpretation of learned representations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4414