Can we trust the attribution method?

06 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Attribution, Interpretability, XAI
Abstract: Attribution methods are essential for interpreting deep learning models, helping to align model decisions with human understanding. However, their trustworthiness remains uncertain. Previous work has highlighted several design flaws in attribution methods such as the choice of reference points and the selection of attribution paths, but we argue that even a theoretically perfect attribution method—one that provides the true ground truth—cannot fully resolve the trust issue. For the first time, we summarize the specific manifestations of such issue: Two samples with infinitely close distances but different classification results share the same important feature attention region. We rigorously derive this phenomenon and construct scenarios demonstrating that attribution trust issues persist even under ideal conditions. Our findings provide a new benchmark for evaluating attribution methods and highlight the need for cautious application in real-world scenarios. Our code is available at: https://anonymous.4open.science/r/Distrust-8677/
Primary Area: interpretability and explainable AI
Submission Number: 2655
Loading