Saliency Maps Contain Network "Fingerprints"Download PDF

Published: 25 Mar 2022, Last Modified: 23 May 2023ICLR 2022 PAIR^2Struct PosterReaders: Everyone
Abstract: Explaining deep learning models and their predictions is an open question with many proposed, but difficult to validate, solutions. This difficulty in assessing explanation methods has raised the question on the validity of these methods: What are they showing and what are the factors influencing the explanations? Furthermore, how should one choose which one to use? Here, we explore saliency-type methods, finding that saliency maps contain network “fingerprints”, by which the network which generated the map can be uniquely identified. We test this by creating datasets made up of saliency maps from different “primary” networks, then training “secondary” networks on these saliency-map datasets. We find that secondary networks can learn to identify which primary network a saliency map comes from. Our findings hold across several saliency methods and for both CNN and ResNet "primary" architectures. Our analysis also reveals complex relationships between methods: a set of methods share fingerprints, while some contain unique fingerprints. We discuss a potentially related prior work that may explain some of these relationships; some methods are made up of 'higher order derivatives'.Our simple analytical framework is a first step towards understanding ingredients of and relationships between many saliency methods.
0 Replies

Loading