Fisher Divergence for Attribution

XIONGREN CHEN; Jiuyong Li; Lin Liu; Jixue Liu; Stefan Peters; Thuc Duy Le; Wentao Gao; Xiaojing Du

Fisher Divergence for Attribution

XIONGREN CHEN, Jiuyong Li, Lin Liu, Jixue Liu, Stefan Peters, Thuc Duy Le, Wentao Gao, Xiaojing Du

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: mutual information

Abstract: Feature attribution methods aim to explain model predictions by identifying input features that are most influential to the output. While perturbation-based methods are intuitive and widely used, they often rely on restricted or discretized perturbation spaces, limiting their ability to capture the complex dependencies in high-dimensional data. In this work, we propose a novel framework that defines the perturbation space as a continuous-time stochastic differential equation (SDE), enabling a general, unbounded formulation of the perturbation space. This formulation significantly increases the expressiveness of the perturbation space while introducing new optimization challenges. To address this, we derive a theoretical connection between the KL divergence and the Fisher divergence under general SDEs, and further establish that the time derivative of mutual information between perturbed and original inputs is governed by the Fisher divergence. These results allow us to simplify the attribution objective and compute pointwise information as feature importance scores. Empirical results on large-scale image classification tasks show that our method produces sharper, more coherent, and better localized attribution maps compared to existing approaches, as demonstrated by both qualitative visualizations and quantitative evaluations.

Supplementary Material: zip

Primary Area: interpretability and explainable AI

Submission Number: 8573

Loading