Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Lottery Tickets can’t be trained from random init. We show that permuting the mask to align with the new init's optimization basin results in a mask that better trains from random init and approaches LTH generalization performance.
Abstract: The Lottery Ticket Hypothesis (LTH) suggests there exists a sparse LTH mask and weights that achieve the same generalization performance as the dense model while using significantly fewer parameters. However, finding a LTH solution is computationally expensive, and a LTH sparsity mask does not generalize to other random weight initializations. Recent work has suggested that neural networks trained from random initialization find solutions within the same basin modulo permutation, and proposes a method to align trained models within the same loss basin. We hypothesize that misalignment of basins is the reason why LTH masks do not generalize to new random initializations and propose permuting the LTH mask to align with the new optimization basin when performing sparse training from a different random init. We empirically show a significant increase in generalization when sparse training from random initialization with the permuted mask as compared to using the non-permuted LTH mask, on multiple datasets (CIFAR-10/100 & ImageNet) and models (VGG11 & ResNet20/50).
Lay Summary: Modern artificial intelligence (AI) systems are incredibly powerful but often require massive amounts of computing power and data to train. This makes them expensive and out of reach for many researchers and developers. To address this, scientists have been exploring “sparser” AI models—systems that use only a small fraction of their potential connections—making them much more efficient to train and run. However, a major hurdle is that a sparse model setup that works well with one starting point for training often fails when training begins from a different starting point. Our research identifies the root cause: misalignment. Think of it like using a key (the sparse setup) on a lock that has been rotated slightly—it just doesn’t fit. To solve this, we developed a method to “re-align” the sparse structure so it matches the patterns of a new starting point. This adjustment dramatically improves the performance of sparse models trained from different starting points, making them nearly as effective as their original versions. Our findings make it easier and more practical to develop leaner, more efficient AI systems, paving the way for broader accessibility and innovation in AI research.
Link To Code: https://github.com/calgaryml/sparse-rebasin
Primary Area: Deep Learning->Theory
Keywords: Lottery Ticket Hypothesis, sparse training, linear mode connectivity, weight symmetry, deep learning, deep neural networks, random initialization, git re-basin, optimization
Submission Number: 9409
Loading