The Difficulty of Training Sparse Neural Networks

Utku Evci; Fabian Pedregosa; Aidan Gomez; Erich Elsen

The Difficulty of Training Sparse Neural Networks

Utku Evci, Fabian Pedregosa, Aidan Gomez, Erich Elsen

Published: 04 Jun 2019, Last Modified: 13 Apr 2025ICML Deep Phenomena 2019Readers: Everyone

Keywords: sparse networks, pruning, energy landscape, sparsity

TL;DR: In this paper we highlight the difficulty of training sparse neural networks by doing interpolation experiments in the energy landscape

Abstract: We investigate the difficulties of training sparse neural networks and make new observations about optimization dynamics and the energy landscape within the sparse regime. Recent work of \citep{Gale2019, Liu2018} has shown that sparse ResNet-50 architectures trained on ImageNet-2012 dataset converge to solutions that are significantly worse than those found by pruning. We show that, despite the failure of optimizers, there is a linear path with a monotonically decreasing objective from the initialization to the ``good'' solution. Additionally, our attempts to find a decreasing objective path from ``bad'' solutions to the ``good'' ones in the sparse subspace fail. However, if we allow the path to traverse the dense subspace, then we consistently find a path between two solutions. These findings suggest traversing extra dimensions may be needed to escape stationary points found in the sparse subspace.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/the-difficulty-of-training-sparse-neural/code)

1 Reply

Loading