The Difficulty of Training Sparse Neural Networks

May 17, 2019 ICML 2019 Workshop Deep Phenomena Blind Submission readers: everyone
  • Keywords: sparse networks, pruning, energy landscape, sparsity
  • TL;DR: In this paper we highlight the difficulty of training sparse neural networks by doing interpolation experiments in the energy landscape
  • Abstract: We investigate the difficulties of training sparse neural networks and make new observations about optimization dynamics and the energy landscape within the sparse regime. Recent work of \citep{Gale2019, Liu2018} has shown that sparse ResNet-50 architectures trained on ImageNet-2012 dataset converge to solutions that are significantly worse than those found by pruning. We show that, despite the failure of optimizers, there is a linear path with a monotonically decreasing objective from the initialization to the ``good'' solution. Additionally, our attempts to find a decreasing objective path from ``bad'' solutions to the ``good'' ones in the sparse subspace fail. However, if we allow the path to traverse the dense subspace, then we consistently find a path between two solutions. These findings suggest traversing extra dimensions may be needed to escape stationary points found in the sparse subspace.
0 Replies

Loading