ToRA: Tensor Adapter for Parameter Efficient Finetuning

ACL ARR 2025 February Submission4843 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recent studies show LoRA cannot reach the performance of full fine-tuning (FFT). This work shows weights and gradients during FT has a long-tail plus high-rank and LoRA’s difficulties stem from its core low-rank matrix factoring assumption. ToRA is a LoRA style parallel adapter, using Tensor Train decomposition to efficiently represent the high-rank ∆W. ToRA consistently outperforms LoRA. For example, rank-8 ToRA beats LoRA for all ranks up to 128. Sometimes by more than 10 points - 80.32 vs. 69.56 for BoolQ and 48.82 vs. 34.20 on MMLU with Llama-3.2-3B. ToRA adapts all self-attention blocks for all layers using the same budget as LoRA - no tuning nor compromise is needed. It also pairs well with popular quantization methods like QLoRA. ToRA is a strong contender as a drop-in replacement for LoRA.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Efficient/Low-Resource Methods for NLP,
Contribution Types: Approaches to low-resource settings, Theory
Languages Studied: English
Submission Number: 4843
Loading