WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts
Abstract: Recent advancements in neural networks have showcased their remarkable capabilities across various domains. Despite these successes, the “black box” problem
still remains. To address this, we propose a novel framework, WWW, that offers the ‘what’, ‘where’, and ‘why’
of the neural network decisions in human-understandable
terms. Specifically, WWW utilizes adaptive selection
for concept discovery, employing adaptive cosine similarity and thresholding techniques to effectively explain
‘what’. To address the ‘where’ and ‘why’, we proposed
a novel combination of neuron activation maps (NAMs)
with Shapley values, generating localized concept maps and
heatmaps for individual inputs. Furthermore, WWW introduces a method for predicting uncertainty, leveraging
heatmap similarities to estimate the prediction’s reliability.
Experimental evaluations of WWW demonstrate superior
performance in both quantitative and qualitative metrics,
outperforming existing methods in interpretability. WWW
provides a unified solution for explaining ‘what’, ‘where’,
and ‘why’, introducing a method for localized explanations
from global interpretations and offering a plug-and-play solution adaptable to various architectures. Code is available
at: https://github.com/ailab-kyunghee/WWW
Loading