CrossLore: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection

CrossLore: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection

ACL ARR 2025 February Submission3288 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent low-rank training methods, such as GaLore, have significantly reduced the memory required to optimize large language models (LLMs). However, these methods often suffer from time-consuming low-rank projection estimations. In particular, the singular value decomposition (SVD) in GaLore can consume more than 80\% of the total training time. To address this issue, we propose CrossLore, which uses cross-head low-rank projection to reduce the substantial time consumption in estimating low-rank projections for multi-head attention. In addition, we employ randomized subspace iteration to achieve fast SVD. To further enhance performance, we propose sparsely coded residuals to reduce the errors caused by low-rank approximation on the first- and second-order moments of the optimizers and weight updates. We evaluate CrossLore on arithmetic reasoning and natural language generation datasets. Our experiments demonstrate that CrossLore delivers superior performance while achieving approximately $4\times$ fine-tuning speed compared to vanilla GaLore.

Paper Type: Long

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: large language models, parameter-efficient fine-tuning, low-rank

Contribution Types: Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 3288

Loading