The Difficulty of Training Sparse Neural NetworksDownload PDF

Published: 04 Jun 2019, Last Modified: 03 Jul 2024ICML Deep Phenomena 2019Readers: Everyone
Keywords: sparse networks, pruning, energy landscape, sparsity
TL;DR: In this paper we highlight the difficulty of training sparse neural networks by doing interpolation experiments in the energy landscape
Abstract: We investigate the difficulties of training sparse neural networks and make new observations about optimization dynamics and the energy landscape within the sparse regime. Recent work of \citep{Gale2019, Liu2018} has shown that sparse ResNet-50 architectures trained on ImageNet-2012 dataset converge to solutions that are significantly worse than those found by pruning. We show that, despite the failure of optimizers, there is a linear path with a monotonically decreasing objective from the initialization to the ``good'' solution. Additionally, our attempts to find a decreasing objective path from ``bad'' solutions to the ``good'' ones in the sparse subspace fail. However, if we allow the path to traverse the dense subspace, then we consistently find a path between two solutions. These findings suggest traversing extra dimensions may be needed to escape stationary points found in the sparse subspace.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](
1 Reply