Towards Robust Saliency Maps

Published: 05 Sept 2024, Last Modified: 16 Oct 2024ACML 2024 Conference TrackEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Explainable AI, Saliency Maps, Formal methods, Neural network verification
Verify Author List: I have double-checked the author list and understand that additions and removals will not be allowed after the submission deadline.
TL;DR: How to use formal methods to verify saliency maps
Abstract: Saliency maps are one of the most popular tools to interpret the operation of a neural network: they compute input features deemed relevant to the final prediction, which are often subsets of pixels that are easily understandable by a human being. However, it is known that relying solely on human assessment to judge a saliency map method can be misleading. In this work, we propose a new neural network verification specification called saliency-robustness, which aims to use formal methods to prove a relationship between Vanilla Gradient (VG) -- a simple yet surprisingly effective saliency map method -- and the network's prediction: given a network, if an input $x$ emits a certain VG saliency map, it is mathematically proven (or disproven) that the network must classify $x$ in a certain way. We then introduce a novel method that combines both Marabou and Crown -- two state-of-the-art neural network verifiers, to solve the proposed specification. Experiments on our synthetic dataset and MNIST show that Vanilla Gradient is surprisingly effective as a certification for the predicted output.
A Signed Permission To Publish Form In Pdf: pdf
Primary Area: Trustworthy Machine Learning (accountability, explainability, transparency, causality, fairness, privacy, robustness, autoML, etc.)
Paper Checklist Guidelines: I certify that all co-authors of this work have read and commit to adhering to the guidelines in Call for Papers.
Student Author: Yes
Submission Number: 117
Loading