SARA: Single Head Attention-based Random Matrix Adaptation

ACL ARR 2025 February Submission7579 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Fully fine-tuning large language models by updating all parameters is both computationally expensive and storage-intensive, particularly when deploying multiple task-specific models. Existing parameter-efficient fine-tuning (PEFT) methods, such as LoRA, reduce the number of trainable parameters via low-rank adaptations, yet they still face scalability challenges as model sizes increase. In this work, we introduce {SARA} (Single-Head Attention-based Random Matrix Adaptation), a novel PEFT approach that leverages random matrices combined with a single-head attention mechanism to further minimize trainable parameters while preserving competitive performance. By integrating with frozen pretrained weights and fine-tuning only a minimal set of additional parameters, SARA offers significant memory savings without compromising accuracy. We validate our method through experiments on two standard benchmarks: the GLUE benchmark for natural language understanding and the E2E challenge for natural language generation. Our results demonstrate that SARA achieves competitive performance with a substantially reduced parameter footprint, making it a promising solution for resource-constrained model adaptation.
Paper Type: Short
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Efficient/Low-Resource Methods for NLP
Languages Studied: English
Submission Number: 7579
Loading