Abstract: Pruning units in a deep network can help speed up inference and training as well as reduce the size of the model. We show that bias propagation is a pruning technique which consistently outperforms the common approach of merely removing units, regardless of the architecture and the dataset. We also show how a simple adaptation to an existing scoring function allows us to select the best units to prune. Finally, we show that the units selected by the best performing scoring functions are somewhat consistent over the course of training, implying the dead parts of the network appear during the stages of training.
TL;DR: Mean Replacement is an efficient method to improve the loss after pruning and Taylor approximation based scoring functions works better with absolute values.
Keywords: pruning, saliency, neural networks, optimization, redundancy, model compression