Find A Winning Sign: Sign Is All We Need to Win the Lottery

ICLR 2025 Conference Submission109 Authors

13 Sept 2024 (modified: 22 Nov 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: lottery ticket hypothesis, network pruning, linear mode connectivity
TL;DR: We demonstrate that an effective signed mask can allow a randomly initialized network to win the lottery.
Abstract: The lottery ticket hypothesis (LTH) posits the existence of a sparse network (a.k.a. winning ticket) that can generalize comparably to its dense counterpart after training from initialization. However, early works fail to generalize its observation and method to large-scale settings. While recent methods, such as weight rewinding or learning rate rewinding (LRR), may have found effective pruning methods, we note that they still struggle with identifying a winning ticket. In this paper, we take a step closer to finding a winning ticket by arguing that a signed mask, a binary mask with parameter sign information, can transfer the capability to achieve strong generalization after training (i.e., generalization potential) to a randomly initialized network. We first share our observation on the subnetwork trained by LRR: if the parameter signs are maintained, the LRR-driven subnetwork retains its generalization potential even when the parameter magnitudes are randomly initialized, excluding those of normalization layers. However, this fails when the magnitudes of normalization layer parameters are initialized together. To tackle the significant influence of normalization layer parameters, we propose AWS, a slight variation of LRR to find A Winning Sign. Specifically, we encourage low error barriers along the linear path connecting the subnetwork trained by AWS to its counterpart with initialized normalization layer parameters, maintaining the generalization potential even when all parameters are initialized. Interestingly, we observe that across various architectures and datasets, a signed mask of the AWS-driven subnetwork can allow a randomly initialized network to perform comparably to a dense network, taking a step closer to the goal of LTH.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 109
Loading