Keywords: implicit, sparsity, adam, CNN, convnet, feature sparsity
TL;DR: Filter level sparsity emerges implicitly in CNNs trained with adaptive gradient descent approaches due to various phenomena, and the extent of sparsity can be inadvertently affected by different seemingly unrelated hyperparameters.
Abstract: We show implicit filter level sparsity manifests in convolutional neural networks (CNNs) which employ Batch Normalization and ReLU activation, and are trained using adaptive gradient descent techniques with L2 regularization or weight decay. Through an extensive empirical study (Anonymous, 2019) we hypothesize the mechanism be hind the sparsification process. We find that the interplay of various phenomena influences the strength of L2 and weight decay regularizers, leading the supposedly non sparsity inducing regularizers to induce filter sparsity. In this workshop article we summarize some of our key findings and experiments, and present additional results on modern network architectures such as ResNet-50.