On the Effect of Gradient Regularisation on Interpretability

Published: 15 Oct 2025, Last Modified: 31 Oct 2025BNAIC/BeNeLearn 2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Track: Type A (Regular Papers)
Keywords: Gradient Regularisation, Saliency Map, Faithfulness, Robustness
Abstract: This paper explores the impact of input gradient regularisation on model interpretability. Although this technique has been known for some time to improve the generalisation ability of deep neural networks, it was only recently highlighted that regularising the input gradient may also enhance its interpretability as a saliency map. Prior works have only observed this effect subjectively, however, and a quantitative evaluation is currently lacking. We aim to fill this gap by quantifying the influence of gradient regularisation on the quality of gradient-based saliency maps across multiple metrics and datasets. We find that gradient regularisation can indeed increase the quality of saliency maps, although this effect is heavily dependent on the specific dataset and/or model. This finding implies that subjective observations regarding the quality of saliency maps are not guaranteed to generalize to different datasets.
Serve As Reviewer: ~Arne_Gevaert1
Submission Number: 4
Loading