Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability

Roman Levin; Manli Shu; Eitan Borgnia; Furong Huang; Micah Goldblum; Tom Goldstein

Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability

Roman Levin, Manli Shu, Eitan Borgnia, Furong Huang, Micah Goldblum, Tom Goldstein

Published: 28 Jan 2022, Last Modified: 22 Jun 2025ICLR 2022 SubmittedReaders: Everyone

Keywords: explainability, interpretability, saliency maps, parameter saliency

Abstract: Conventional saliency maps highlight input features to which neural network predictions are highly sensitive. We take a different approach to saliency, in which we identify and analyze the network parameters, rather than inputs, which are responsible for erroneous decisions. We first verify that identified salient parameters are indeed responsible for misclassification by showing that turning these parameters off improves predictions on the associated samples, more than pruning the same number of random or least salient parameters. We further validate the link between salient parameters and network misclassification errors by observing that fine-tuning a small number of the most salient parameters on a single sample results in error correction on other samples which were misclassified for similar reasons -- nearest neighbors in the saliency space. After validating our parameter-space saliency maps, we demonstrate that samples which cause similar parameters to malfunction are semantically similar. Further, we introduce an input-space saliency counterpart which reveals how image features cause specific network components to malfunction.

One-sentence Summary: We develop a method to identify salient network parameters (rather than input features) which are responsible for erroneous decisions.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/where-do-models-go-wrong-parameter-space/code)

19 Replies

Loading