Abstract: Interpreting deep neural networks is of great importance to
understand and verify deep models for natural language processing
(NLP) tasks. However, most existing approaches only
focus on improving the performance of models but ignore
their interpretability. In this work, we propose an approach
to investigate the meaning of hidden neurons of the convolutional
neural network (CNN) models. We first employ
saliency map and optimization techniques to approximate the
detected information of hidden neurons from input sentences.
Then we develop regularization terms and explore words in
vocabulary to interpret such detected information. Experimental
results demonstrate that our approach can identify
meaningful and reasonable interpretations for hidden spatial
locations. Additionally, we show that our approach can describe
the decision procedure of deep NLP models.
0 Replies
Loading