Keywords: PEFT, finetuning, SAE
Abstract: Parameter-Efficient Fine-Tuning (PEFT) methods have become essential for adapting large pretrained models to downstream tasks, with Low-Rank Adaptation (LoRA) emerging as one of the most widely adopted solutions.
However, there remain several key limitations in current LoRA-based PEFT methods:
(1) the low-rank feature space in LoRA is rigid, reducing its capacity for dynamic adaptation;
(2) the restricted dimensionality, coupled with dense and entangled representations, constrains the model’s capacity to generalize across multiple domains;
and (3) the compression process limits the extent to which model behavior can be understood from the learned representations, making it difficult to interpret the functional role of task-relevant features.
In this paper, we argue that sparse adaptation offers a principled and more flexible alternative to low-rank adaptation, with the added benefit of enhancing interpretability.
Instead of compressing information into a low-rank subspace, sparse adaptation focuses on identifying and selectively activating a small subset of high-dimensional latent features, enabling a more decomposed and dynamic fine-tuning process.
Building on this paradigm, we propose STAN (Sparse adapTAtioN), a novel method that actualizes sparse adaptation by integrating dedicated Sparse Autoencoder (SAE) modules into frozen pretrained models.
STAN learns to encode task-specific adaptations through sparse activations within the SAEs, thereby using sparse features as the mechanism for dynamic and robust adaptation.
Beyond the flexibility offered by input-dependent sparse combinations, the large latent space of the SAEs provides scalable capacity for cross-domain adaptation, while their inherent semantic decomposition structure supports more interpretable representations.
Through extensive experiments, we demonstrate that STAN outperforms state-of-the-art PEFT baselines across a range of benchmarks, while uniquely enabling inspection and analysis of the learned sparse activations. Our findings position sparse adaptation as a promising new direction in PEFT, advancing both the expressivity and interpretability of model adaptation.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 2686
Loading