Keep the Gradients Flowing: Using Gradient Flow to study Sparse Network Optimization

Kale-ab Tessera; Sara Hooker; Benjamin Rosman

Keep the Gradients Flowing: Using Gradient Flow to study Sparse Network Optimization

Kale-ab Tessera, Sara Hooker, Benjamin Rosman

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: neural networks, sparsity, gradient flow, sparse network optimization

Abstract: Training sparse networks to converge to the same performance as dense neural architectures has proven to be elusive. Recent work suggests that initialization is the key. However, while this direction of research has had some success, focusing on initialization alone appears to be inadequate. In this paper, we take a broader view of training sparse networks and consider various choices made during training that might disadvantage sparse networks. We measure the gradient flow across different networks and datasets, and show that the default choices of optimizers, activation functions and regularizers used for dense networks can disadvantage sparse networks. Based upon these findings, we show that gradient flow in sparse networks can be improved by reconsidering aspects of the architecture design and the training regime. Our work suggests that initialization is only one piece of the puzzle and a wider view of tailoring optimization to sparse networks yields promising results.

One-sentence Summary: We use gradient flow to study sparse network optimization.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=pITyXorUI

14 Replies

Loading