Abstract: Existing methods to fine-tune LLMs, like Adapter, Prefix-tuning, and LoRA, which introduce extra modules or ad-ditional input sequences to inject new skills or knowledge, may compromise the innate abilities of LLMs. In this paper, we propose LLaMA-Excitor, a lightweight method that stimulates the LLMs' potential to better follow instructions by gradually paying more attention to worthwhile information. Specifically, LLaMA-Excitor does not directly change the intermediate hidden state during the self-attention calculation. We designed the Excitor block as a bypass module that reconstructs Keys and changes the importance of Values in self-attention using learnable prompts. LLaMA-Excitor ensures a self-adaptive allocation of additional attention to input instructions, thus effectively preserving LLMs' pre-trained knowledge when fine-tuning LLMs on low-quality instruction-following datasets. Furthermore, we unify the modeling of multi-modal and language-only tuning, extending LLaMA-Excitor to a powerful visual instruction follower without the need for complex multi-modal alignment. Our approach is evaluated in language-only and multi-modal scenarios. Compared with the original LLaMA-7B, LLaMA-Excitor is the only PEFT method that maintains basic capabilities and achieves +3.12% relative improvement on the MMLU benchmark. In the visual instruction tuning, we achieve a new state-of-the-art image captioning performance on MSCOCO (157.5 CIDEr), and a comparable performance on ScienceQA (88.39%) to cutting-edge models with more parameters and extensive vision-language pertaining. The code will be available at https://zoubo9034.github.io/Excitor/.
Loading