Track: tiny / short paper (up to 4 pages)
Keywords: lottery ticket hypothesis, expander graphs, sparse neural networks, multi-task adaptation
TL;DR: Dual lottery ticket hypothesis via expander graph masks for adaptive finetuning
Abstract: Adapting foundation models to new tasks often involves modifying all model weights, leading to destructive interference such as catastrophic forgetting and degraded multi-task performance. Sparse adaptation methods like Lottery Ticket Adaptation (LoTA) mitigate these issues by optimizing only sparse subnetworks, achieving better results and enabling model merging across dissimilar tasks. Concurrently, the Dual Lottery Ticket Hypothesis (DLTH) states that randomly selected subnetworks can be transformed to a trainable condition that matches the performance of winning tickets. In this work, our goal is to explore the DLTH in sparse transformer finetuning tasks. We introduce a novel approach that employs expander graph masks to obtain an initial sparse subnetwork instead of random selection. In the first stage by maintaining a high spectral gap through expander masks, we transform randomly selected subnetworks into trainable ones. This method not only improves accuracy over random pruning but also uses the same mask across all layers, simplifying the adaptation process. This approach demonstrates expander-based initial pruning enhances sparse adaptations in foundation models, with the potential of addressing multi-task learning challenges without destructive interference.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 27
Loading