Subset Selection-based Attribution Regularization for Rational and Stable Interpretability

16 Sept 2025 (modified: 18 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: attribution regularization, attirbution invariance, attribution rationality
Abstract: While explainable AI (XAI) has developed numerous attribution mechanisms to enhance model transparency, existing post-hoc methods remain limited to improving attribution faithfulness. In contrast, attribution invariance and rationality stem from the model's internal parameters, requiring specialized constraints during training for improvement. Current training strategies, on the one hand, supervised rationality-enhancing methods depend on manual annotations that incorporate human priors may conflict with the model's intrinsic decision reasoning. On the other hand, self-supervised invariance regularization methods rely on gradiet-based attribution methods (e.g., Grad-CAM) with low faithfulness, resulting in misaligned explanations with the actual logic. Such not only hinders attribution refinement but also adversely affects task performance.Such not only hinders attribution refinement but also adversely affects task performance.Such not only hinders attribution refinement but also adversely affects task performance. To overcome these challenges, we introduce a training framework grounded in high-faithfulness submodular attribution, which enables the extraction of compact, discriminative pseudo-ground-truth regions without manual supervision. By integrating spatial constraints and high-confidence sample filtering, our approach effectively suppresses irrelevant areas and supplies high-quality attribution targets that positively guide model training. However, submodular attribution encounters non-differentiable and path-dependent issues of black-box attribution searches.We propose a novel submodular ranking loss that enforces search path consistency and termination alignment under geometric transformations, enabling differentiable optimization of the greedy search process. We propose a novel submodular ranking loss that enforces search path consistency and termination alignment under geometric transformations, enabling differentiable optimization of the greedy search process. Extensive evaluation across classification accuracy, attribution stability, faithfulness, rationality, and precision shows that our method significantly enhances attribution quality with minimal effect on task performance.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 7800
Loading