Local vs. Global interpretations for NLPDownload PDF

Anonymous

15 Oct 2020 (modified: 05 May 2023)HAMLETS @ NeurIPS2020Readers: Everyone
Keywords: NLP, interpretability, CNN, text classification
TL;DR: A word level score matrix that characterizes and explains a trained CNN
Abstract: Recently, WordsWorth scores have been proposed for calculating feature importance in the context of traditional deep learning models trained for text classification tasks. Here, we experiment with the idea behind these scores and present them as a global explanation for a trained model. Interpretability literature shows that delete one method acts as a good explanation for NLP tasks. Since WW scores act as a good proxy to delete one scores for text classification, we extend the argument and utilize them for interpretation. We provide local and global explanations for a CNN trained on the IMDB reviews dataset by comparing these scores with LIME. Similarly to LIME, the global representation is a bag of words representation. Overall, we argue that evaluating a trained neural network on single words, at all possible locations in the input text one by one, gives powerful and valid insights into the workings of these otherwise black box models. This is a work in progress and we are looking for further tests to evaluate the usefulness of our method.
0 Replies

Loading