Fisher Divergence for Attribution through Stochastic Differential Equations

22 Jan 2025 (modified: 18 Jun 2025)Submitted to ICML 2025EveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep neural networks achieve remarkable performance but often lack interpretability, raising concerns in critical applications. Feature attribution methods, including perturbation-based methods, aim to address this by quantifying the contribution of input features to model outputs. However, existing methods often rely on narrowly defined perturbation spaces or sampling within the predefined large perturbation space, leading to incomplete or misleading explanations, especially in high-dimensional settings. To overcome these limitations, we propose a novel perturbation-based framework to leverage Stochastic Differential Equations to model continuous perturbations and comprehensively explore the input space in an effective way. By connecting Fisher Divergence with the time derivatives of KL divergence and mutual information, our approach provides a rigorous theoretical foundation for quantifying feature importance. Additionally, we integrate the Information Bottleneck (IB) principle into an optimization framework, ensuring the identification of the most informative features while maintaining predictive performance.
Primary Area: Social Aspects->Accountability, Transparency, and Interpretability
Keywords: Stochastic Differential Equations, Fisher Divergence, Information Bottlenecks
Submission Number: 6205
Loading