Abstract: Despite neural network’s high performance, the lack of interpretability has been the main bottleneck for its safe usage in practice. In domains with high stakes (e.g., medical diagnosis), gaining insights into the network is critical for gaining trust and being adopted. One of the ways to improve interpretability of a NN is to explain the importance of a particular concept (e.g., gender) in prediction. This is useful for explaining reasoning behind the networks’ predictions, and for revealing any biases the network may have. This work aims to provide quantitative answers to \textit{the relative importance of concepts of interest} via concept activation vectors (CAV). In particular, this framework enables non-machine learning experts to express concepts of interests and test hypotheses using examples (e.g., a set of pictures that illustrate the concept). We show that CAV can be learned given a relatively small set of examples. Testing with CAV, for example, can answer whether a particular concept (e.g., gender) is more important in predicting a given class (e.g., doctor) than other set of concepts. Interpreting with CAV does not require any retraining or modification of the network. We show that many levels of meaningful concepts are learned (e.g., color, texture, objects, a person’s occupation), and we present CAV’s \textit{empirical deepdream} — where we maximize an activation using a set of example pictures. We show how various insights can be gained from the relative importance testing with CAV.
TL;DR: This work aims to provide quantitative answers to the relative importance of concepts of interest via concept activation vectors (CAV). In particular, this framework enables non-machine learning experts to express concepts of interest and test hypotheses using examples (e.g., a set of pictures that illustrate the concept). We show that CAV can be learned given a relatively small set of examples. Hypothesis testing with CAV can answer whether a particular concept (e.g., gender) is more important in predicting a given class (e.g., doctor) than other sets of concepts. Interpreting networks with CAV does not require any retraining or modification of the network.
Code: [![github](/images/github_icon.svg) tensorflow/tcav](https://github.com/tensorflow/tcav) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=S1viikbCW)
11 Replies
Loading