Detecting Memorization in ReLU NetworksDownload PDF

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone
Abstract: We propose a new notion of 'non-linearity' of a network layer with respect to an input batch that is based on its proximity to a linear system, which is reflected in the non-negative rank of the activation matrix. We measure this non-linearity by applying non-negative factorization to the activation matrix. Considering batches of similar samples, we find that high non-linearity in deep layers is indicative of memorization. Furthermore, by applying our approach layer-by-layer, we find that the mechanism for memorization consists of distinct phases. We perform experiments on fully-connected and convolutional neural networks trained on several image and audio datasets. Our results demonstrate that as an indicator for memorization, our technique can be used to perform early stopping.
Keywords: Memorization, Generalization, ReLU, Non-negative matrix factorization
TL;DR: We use the non-negative rank of ReLU activation matrices as a complexity measure and show it (negatively) correlates with good generalization.
Data: [CIFAR-10](https://paperswithcode.com/dataset/cifar-10), [Fashion-MNIST](https://paperswithcode.com/dataset/fashion-mnist)
15 Replies

Loading