When less is more: Simplifying inputs aids neural network understandingDownload PDF

Published: 28 Jan 2022, Last Modified: 22 Oct 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: interpretability, compression, network training
Abstract: Are all bits useful? In this work, we propose SimpleBits, a method to synthesize simplified inputs by reducing information content, and carefully measure the effect of such simplification on learning. Crucially, SimpleBits does not require any domain-specific knowledge to constrain which input features should be removed. Instead, SimpleBits learns to remove the features of inputs which are least relevant for a given task. Concretely, we jointly optimize for input simplification by reducing inputs' bits per dimension as given by a pretrained generative model, as well as for the classification performance. We apply the simplification approach to a wide range of scenarios: conventional training, dataset condensation and post-hoc explanations. In this way, we analyze what simplified inputs tell us about the decisions made by classification networks. We show that our simplification approach successfully removes superfluous information for tasks with injected distractors. When applied post-hoc, our approach provides intuition into reasons for misclassifications of conventionally trained classifiers. Finally, for dataset condensation, we find that inputs can be simplified with only minimal accuracy degradation. Overall, our learning-based simplification approach offers a valuable new tool to explore the basis of network decisions.
One-sentence Summary: Simplifying inputs to contain less bits can help understand deep neural network behavior.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2201.05610/code)
17 Replies

Loading