A Fine-Grained Study of Interpretability of Convolutional Neural Networks for Text ClassificationOpen Website

Published: 01 Jan 2022, Last Modified: 12 May 2023HAIS 2022Readers: Everyone
Abstract: In this work, we proposed a new interpretability framework for convolutional neural networks trained for text classification. The objective is to discover the interpretability of the convolutional layers that composes the architecture. The methodology introduced explores the most relevant words for the classification and more generally look for the most relevant concepts learned in the internal representation of the CNN. Here, the concepts studied were the POS tags. Furthermore, we have proposed an iterative algorithm to determine the most relevant filters or neurons for the task. The outcome of this algorithm is a threshold used to mask the least active neurons and focus the interpretability study only on the most relevant parts of the network. The introduced framework has been validated for explaining the internal representation of a well-known sentiment analysis task. As a result of this study, we found evidence that certain POS tags, such as nouns and adjectives, are more relevant for the classification. Moreover, we found evidence of the redundancy among the filters from a convolutional layer.
0 Replies

Loading