Towards trustworthy explanations with gradient-based attribution methodsDownload PDF

Published: 22 Oct 2021, Last Modified: 05 May 2023NeurIPS-AI4Science PosterReaders: Everyone
Keywords: deep learning, interpretability, attribution methods, saliency, genomics, manifold mixup, computational biology
Abstract: The low interpretability of deep neural networks (DNNs) remains a key barrier to their wide-spread adoption in the sciences. Attribution methods offer a promising solution, providing feature importance scores that serve as first-order model explanations for a given input. In practice, gradient-based attribution methods, such as saliency maps, can yield noisy importance scores depending on model architecture and training procedure. Here we explore how various regularization techniques affect model explanations with saliency maps using synthetic regulatory genomic data, which allows us to quantitatively assess the efficacy of attribution maps. Strikingly, we find that generalization performance does not imply better saliency explanations; though unlike before, we do not observe a clear tradeoff. Interestingly, we find that conventional regularization strategies, when tuned appropriately, can yield high generalization and interpretability performance, similar to what can be achieved with more sophisticated techniques, such as manifold mixup. Our work challenges the conventional knowledge that model selection should be based on test performance; another criterion is needed to sub-select models ideally suited for downstream post hoc interpretability for scientific discovery.
Track: Original Research Track
1 Reply