MEME: Generating RNN Model Explanations via Model Extraction

Anonymous

MEME: Generating RNN Model Explanations via Model Extraction

Anonymous

15 Oct 2020 (modified: 26 May 2025)HAMLETS @ NeurIPS2020Readers: Everyone

Keywords: Interpretability, Explainability, Concept Extraction, RNN, Model Extraction, Healthcare, Knowledge Extraction

TL;DR: Novel method for approximating RNNs with interpretable models represented by high-level concepts and their interactions.

Abstract: Recurrent Neural Networks (RNNs) have achieved remarkable performance on a range of tasks. A key step to further empowering RNN-based approaches is improving their explainability and interpretability. In this work we present MEME: a model extraction approach capable of approximating RNNs with interpretable models represented by human-understandable concepts and their interactions. We demonstrate how MEME can be applied to two multivariate, continuous data case studies: Room Occupation Prediction, and In-Hospital Mortality Prediction. Using these case-studies, we show how our extracted models can be used to interpret RNNs both locally and globally, by approximating RNN decision-making via interpretable concept interactions.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/meme-generating-rnn-model-explanations-via/code)

0 Replies

Loading