- TL;DR: Instead of fine-tuning after pruning, rewind weights to their values earlier in training and re-train the networks to achieve higher accuracy when pruning neural networks.
- Abstract: Neural network pruning is a popular technique for reducing inference costs by removing connections, neurons, or other structure from the network. In the literature, pruning typically follows a standard procedure: train the network, remove unwanted structure (pruning), and train the resulting network further to recover accuracy (fine-tuning). In this paper, we explore an alternative to fine-tuning: rewinding. Rather than continuing to train the resultant pruned network (fine-tuning), rewind the remaining weights to their values from earlier in training, and re-train the resultant network for the remainder of the original training process. We find that this procedure, which repurposes the strategy for finding lottery tickets presented by Frankle et al. (2019), makes it possible to prune networks further than is possible with fine-tuning for a given target accuracy, provided that the weights are rewound to a suitable point in training. We also find that there are wide ranges of suitable rewind points that achieve higher accuracy than fine-tuning across all tested networks. Based on these results, we argue that practitioners should explore rewinding as an alternative to fine-tuning for neural network pruning.
- Code: https://github.com/comparing-rewinding-finetuning/code
- Keywords: pruning, sparsity, fine-tuning, lottery ticket