SARA: Single Head Attention-based Random Matrix Adaptation

SARA: Single Head Attention-based Random Matrix Adaptation

ACL ARR 2025 February Submission7579 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Fully fine-tuning large language models by updating all parameters is both computationally expensive and storage-intensive, particularly when deploying multiple task-specific models. Existing parameter-efficient fine-tuning (PEFT) methods, such as LoRA, reduce the number of trainable parameters via low-rank adaptations, yet they still face scalability challenges as model sizes increase. In this work, we introduce {SARA} (Single-Head Attention-based Random Matrix Adaptation), a novel PEFT approach that leverages random matrices combined with a single-head attention mechanism to further minimize trainable parameters while preserving competitive performance. By integrating with frozen pretrained weights and fine-tuning only a minimal set of additional parameters, SARA offers significant memory savings without compromising accuracy. We validate our method through experiments on two standard benchmarks: the GLUE benchmark for natural language understanding and the E2E challenge for natural language generation. Our results demonstrate that SARA achieves competitive performance with a substantially reduced parameter footprint, making it a promising solution for resource-constrained model adaptation.

Paper Type: Short

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: Efficient/Low-Resource Methods for NLP

Languages Studied: English

Submission Number: 7579

Loading