Abstract: When training neural networks, dying neurons —units becoming inactive or saturated— are traditionally seen as harmful. This paper sheds new light on this phenomenon. By exploring the impact of various hyperparameter configurations on dying neurons during training, we gather insights on how to improve upon sparse training approaches to pruning. We introduce Demon Pruning (DemP), a method that controls the proliferation of dead neurons through a combination of noise injection on active units and a one-cycled schedule regularization strategy, dynamically leading to network sparsity. Experiments on CIFAR-10 and ImageNet datasets demonstrate that DemP outperforms existing dense-to-sparse structured pruning methods, achieving better accuracy-sparsity tradeoffs while speeding up training up to 3.56$\times$. These findings provide a novel perspective on dying neurons as a resource for efficient model compression and optimization.
Submission Length: Regular submission (no more than 12 pages of main content)
Code: https://github.com/SimonDufLab/Maxwell_demon
Assigned Action Editor: ~Naigang_Wang1
Submission Number: 3738
Loading