Considering Layerwise Importance in the Lottery Ticket HypothesisDownload PDF

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone
Keywords: Lottery Ticket Hypothesis
Abstract: The recently-introduced Lottery Ticket Hypothesis (LTH) posits that it is possible to extract a sparse trainable subnetwork from a dense network using iterative magnitude pruning. By iteratively training the model, removing the connections with the lowest global weight magnitude and rewinding the remaining connections, sparse networks can be extracted that, when fully trained, reach a similar or better performance than their dense counterpart. Intuitively, this approach of comparing connection weights globally removes a lot of context about the connection weights and their relations to other connections in their layer as the weight distributions in layers throughout the network often differ significantly. In this paper we study a number of different approaches that try to recover some of this layer distributional context by computing an importance value for each connection that is dependent on the weights of the other connections in the same layer. We then generalise the LTH to use weight importances rather than weight magnitudes. Experiments using these importance metrics on several architectures and datasets, reveal interesting aspects on the structure and emergence of Lottery tickets. We find that given a repeatable training procedure, applying different importance metrics lead to distinct performant lottery tickets with little overlapping connections.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
TL;DR: Using different importance measures in the LTH procedure to determine properties of the resulting LTs.
8 Replies

Loading