Abstract: Deep State Space Models (SSMs), such as Mamba (Gu & Dao, 2024), have become powerful tools for language modeling, offering high performance and linear scalability with sequence length. However, the application of parameter-efficient fine-tuning (PEFT) methods to SSM-based models remains largely underexplored. We start by investigating two fundamental questions on existing PEFT methods: (i) How do they perform on SSM-based models? (ii) Which parameters should they target for optimal results? Our analysis shows that LoRA and its variants consistently outperform all other PEFT methods. While LoRA is effective for linear projection matrices, it fails on SSM modules—yet still outperforms other methods applicable to SSMs, indicating their limitations. This underscores the need for a specialized SSM tuning approach. To address this, we propose Sparse Dimension Tuning (SDT), a PEFT method tailored for SSM modules. Combining SDT for SSMs with LoRA for linear projection matrices, we achieve state-of-the-art performance across extensive experiments.
Lay Summary: AI chatbots like ChatGPT rely on large models that are expensive to adapt to new tasks or update with new knowledge. To reduce this cost, researchers have developed tuning methods that adjust only a small subset of the model—a strategy that works well for traditional model architectures. However, these traditional models can be slow when processing long texts. A newer type of model architecture handles long inputs much more efficiently, but it is unclear whether existing tuning methods are still effective. We evaluated these methods on the new models and identified their strengths and weaknesses. Based on our findings, we developed a new tuning method tailored to this faster architecture. Our approach makes these models easier and more cost-effective to adapt.
Link To Code: https://github.com/furiosa-ai/ssm-peft
Primary Area: General Machine Learning->Transfer, Multitask and Meta-learning
Keywords: parameter-efficient fine-tuning, state space model, mamba, lora
Submission Number: 822
Loading