Track: Extended Abstract Track
Keywords: Lottery Ticket Hypothesis, sparse training, linear mode connectivity, weight symmetry, deep learning, deep neural networks, random initialization, git re-basin, optimization
TL;DR: Lottery Tickets can’t be trained from random init. We show that permuting the mask to align with the new init's optimization basin results in a mask that better trains from random init and approaches LTH generalization performance.
Abstract: The Lottery Ticket Hypothesis (LTH) suggests that there exists a sparse $\textit{winning ticket}$ mask and weights that achieves the same generalization performance as the dense model while using much fewer parameters. LTH achieves this by iteratively sparsifying and re-training within the pruned solution basin. This procedure is expensive, and any other random initialization sparsified using the winning ticket mask fails to achieve good generalization performance. Recent work has suggested that Deep Neural Networks (DNNs) trained from random initialization find solutions within the same basin modulo weight symmetry, and proposes a method to align trained models within the same basins. We propose permuting the winning ticket mask to align with the new optimization basin when performing sparse training
from a different random initialization than the one used to derive the pruned mask. Using this permuted mask, we show it is possible to significantly increase the generalization performance of sparse training from random initialization as compared to sparse training naively using the non-permuted mask
Submission Number: 61
Loading