Track: Main paper track (up to 5 pages excluding references and appendix)
Keywords: Sparse Fine-Tuning, Alignment, Subspace Regularization
TL;DR: We introduce DARS, a novel sparse fine-tuning method that preserves the alignment structure of pre-trained networks by adaptive subspace regularizaiton.
Abstract: Recent works have identified the alignment, which measures a layerwise weight
correlation, as a novel yet crucial mechanism for feature learning. We investigate an
underlying connection between the alignment learning and the structural fitting of a
network to the training data span. Based on this insight, we further demonstrate that
fine-tuning on out-of-distribution (OOD) data disrupts this well-aligned structure
fitted during the pre-training phase, degrading generalization performance. To
address this, we propose DARS, DisAlignment-Regularized Sparse fine-tuning, a
novel sparse fine-tuning approach that mitigates disalignment by letting the gradient
update to be partially constrained within the principal subspace of the pre-trained
network, constructed based on the in-distribution (ID) data used for its pre-training.
Specifically, we define the two disjoint subsets of trainable parameters for sparse
channel unfreezing: i) a random subset and ii) a subset with higher gradient projections onto the principal subspace. The latter serves as a disalignment regularizer
during fine-tuning, while the random subset ensures a minimal bias in parameter selection. By adjusting the ratio between the two subsets, we can control the strength
of subspace regularization, thereby balancing the trade-off between generalization
capacity and strong fitting to new downstream tasks. By employing DARS, we
achieved SOTA performance on various benchmarks, including commonsense and
arithmetic reasoning tasks, across LLaMA-7B and LLaMA2-7B.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 26
Loading