ME-LORA: MEMORY-EFFICIENT BAYESIAN LOW- RANK ADAPTATION FOR LARGE LANGUAGE MODELS

Xulin Huang; Hehuan Cao; Jingyan Sui; Siyuan Tao; Dongbo Bu

ME-LORA: MEMORY-EFFICIENT BAYESIAN LOW- RANK ADAPTATION FOR LARGE LANGUAGE MODELS

Xulin Huang, Hehuan Cao, Jingyan Sui, Siyuan Tao, Dongbo Bu

27 Sept 2024 (modified: 21 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Low-rank adaptation, Bayesian estimation, Fine-tune

TL;DR: Memory-efficient Low-Rank Adaptation introduces a low-dimensional square matrix between the two low-rank matrices in LoRA and performs Bayesian modeling on this low-dimensional matrix.

Abstract: Bayesian Low-Rank Adaptation (LoRA) has shown excellent performance in reducing the overconfidence of inference by large language models as it can accurately quantify the inference uncertainty. However, the general Bayesian LoRA technique requires huge memory as it fine-tunes three low-rank matrices with large size: two matrices have size of $n\times r$ and the other has size of $r\times m$, where $r$ denotes rank, and $n, m$ denote the size of input and output, respectively. The large amount of memory required by this technique precludes its practical applications especially for the cases with long input or output. Here, we propose a memory efficient Bayesian LoRA technique (called Me-LoRA) that needs only two low-rank matrices plus two small matrices with size of only $r\times r$. The key idea of our approach is that we introduce a small matrix (with size $r\times r$) to describe the variance estimates required by Bayesian LoRA, which is calculated through sampling two other samll matrices. Compared with the general Bayesian LoRA technique, our approach reduces the memory requirement by nearly $\frac{1}{3}$ as the rank $r$ is generally very small. Experimental results using both LlaMA-7B and LlaMA-13B models on representative data sets suggest that our approach achieves the same performance as the original Bayesian LoRA techniques and outperforms the existing approaches. In summary, the memory-efficient Bayesian LoRA presented in this study circumvents the challenge of high memory requirement and thus paves a new way to the practical applications of Bayesian LoRA in the cases with larger input and output size.

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8899

Loading