Keywords: Convolutional Neural Networks, Interpretability, Deep Learning
Abstract: This paper presents an unsupervised method to learn a neural network, namely an explainer, to diagnose part information that is used for inference by a pre-trained convolutional neural network (CNN). The explainer performs like an auto-encoder, which quantitatively disentangles part features from intermediate layers and uses the part features to reconstruct CNN features without much loss of information. The disentanglement and quantification of part information help people understand intermediate-layer features used by the CNN. More crucially, we learn the explainer via knowledge distillation without using any annotations of object parts or textures for supervision. In experiments, our method was widely used to diagnose features of different benchmark CNNs, and explainers significantly boosted the feature interpretability.
Original Pdf: pdf
4 Replies
Loading