Automatic Rule Extraction from Long Short Term Memory Networks

W. James Murdoch; Arthur Szlam

Automatic Rule Extraction from Long Short Term Memory Networks

W. James Murdoch, Arthur Szlam

Published: 06 Feb 2017, Last Modified: 05 May 2023ICLR 2017 PosterReaders: Everyone

Abstract: Although deep learning models have proven effective at solving problems in natural language processing, the mechanism by which they come to their conclusions is often unclear. As a result, these models are generally treated as black boxes, yielding no insight of the underlying learned patterns. In this paper we consider Long Short Term Memory networks (LSTMs) and demonstrate a new approach for tracking the importance of a given input to the LSTM for a given output. By identifying consistently important patterns of words, we are able to distill state of the art LSTMs on sentiment analysis and question answering into a set of representative phrases. This representation is then quantitatively validated by using the extracted phrases to construct a simple, rule-based classifier which approximates the output of the LSTM.

TL;DR: We introduce a word importance score for LSTMs, and show that we can use it to replicate an LSTM's performance using a simple, rules-based classifier.

Conflicts: fb.com, berkeley.edu

Keywords: Natural language processing, Deep learning, Applications

15 Replies

Loading