Soft iEP: On the Exploration Inefficacy of Gradient Based Strong Lottery Exploration

Yusuke Iwasawa; Masato Hirakawa; Yutaka Matsuo

Soft iEP: On the Exploration Inefficacy of Gradient Based Strong Lottery Exploration

Yusuke Iwasawa, Masato Hirakawa, Yutaka Matsuo

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: representation learning for computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: strong lottery tickets, edge-pop, soft pruning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Edge-popup (EP) is a de facto algorithm to find \emph{strong lottery tickets (SLT)}, the sparse subnetworks that achieve high performance \emph{without any weight updates}. EP find the subnetworks by optimizing of a score vector representing the importance of each edge, and select subnetworks given optimized scores. This paper first show that such a simple gradient-based method result in suboptimal solution due to the existence of \emph{dying edges}. Specifically, we show that, most edges are \emph{never} selected during the search process, i.e., EP might be trapped around the local minima nearby random subnetworks and need help to search the entire spaces of subnetworks effectively. Unlike the standard iterative pruning that masks out a certain amount of edges and thus induce a similar problem to the dying edges, Soft iEP \emph{do not} disable the bottom edges at each cycle, i.e., leave a chance to be selected at the end regardless of whether it was chosen at the former cycle. Empirical validations show that iEP with soft pruning stably outperforms both EP and iEP w/ hard pruning on ImageNet, CIFAR-10, and CIFAR-100 and reduces dying edges. Notably, it discovered a subnetwork that is sparser than ResNet-34 but exceeds the performance of trained dense ResNet34 by over 2.4\% in the accuracy of ImageNet (76.0\% with 20M parameters). Our results also provide new insight into why iterative pruning helps to find good sparse networks.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: pdf

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4910

Loading