What's in the Box? Exploring the Inner Life of Neural Networks with Robust Rules

Jonas Fischer; Anna Oláh; Jilles Vreeken

What's in the Box? Exploring the Inner Life of Neural Networks with Robust Rules

Jonas Fischer, Anna Oláh, Jilles Vreeken

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Neural Networks, CNN, explaining, interpretable, Rules, black box

Abstract: We propose a novel method for exploring how neurons within a neural network interact. In particular, we consider activation values of a network for given data, and propose to mine noise-robust rules of the form $X \rightarrow Y$ , where $X$ and $Y$ are sets of neurons in different layers. To ensure we obtain a small and non-redundant set of high quality rules, we formalize the problem in terms of the Minimum Description Length principle, by which we identify the best set of rules as the one that best compresses the activation data. To discover good rule sets, we propose the unsupervised ExplaiNN algorithm. Extensive evaluation shows that our rules give clear insight in how networks perceive the world: they identify shared, resp. class-specific traits, compositionality within the network, as well as locality in convolutional layers. Our rules are easily interpretable, but also super-charge prototyping as they identify which groups of neurons to consider in unison.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: We propose a rule mining approach that reveals how neural networks perceive the world.

Reviewed Version (pdf): /references/pdf?id=lxADlT2yEw

16 Replies

Loading