BSLoRA: Enhancing the Parameter Efficiency of LoRA with Intra-Layer and Inter-Layer Sharing

Yuhua Zhou; Ruifeng Li; Changhai Zhou; Fei Yang; Aimin PAN

BSLoRA: Enhancing the Parameter Efficiency of LoRA with Intra-Layer and Inter-Layer Sharing

Yuhua Zhou, Ruifeng Li, Changhai Zhou, Fei Yang, Aimin PAN

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning method for large language models (LLMs) to adapt to downstream tasks. However, in scenarios where multiple LoRA models are deployed simultaneously, standard LoRA introduces substantial trainable parameters, resulting in significant memory overhead and inference latency, particularly when supporting thousands of downstream tasks on a single server. While existing methods reduce stored parameters via parameter sharing, they fail to capture both local and global information simultaneously. To address this issue, we propose the Bi-Share LoRA (BSLoRA), which extends local LoRA with intra-LoRA and inter-LoRA parameter sharing to better capture local and global information. This approach reduces trainable parameters while maintaining or even enhancing model performance. Additionally, we design three transformation methods to improve the compatibility and collaborative efficiency of shared parameters with varying shapes, enhancing overall adaptability. Experiments on the 7B, 8B, and 13B versions of Llama show that BSLoRA, with only 44.59% of the parameters of standard LoRA, outperforms LoRA by approximately 0.33% on commonsense reasoning and 2.08% on MMLU benchmarks. Code is available at https://github.com/yuhua-zhou/BSLoRA.git.

Lay Summary: Large language models can be adapted to different downstream tasks using LoRA fine-tuning. However, as model parameter counts increase, the number of parameters required by LoRA fine-tuning also grows. How to effectively reduce LoRA’s parameter footprint has therefore become an important research question. We first analyze the parameters produced by LoRA fine-tuning and use an entropy-based similarity measure to evaluate the similarity between different parameter modules, discovering high entropy similarity among modules within the same layer and, likewise, among modules across different layers. Building on this insight, we introduce BSLoRA, which reduces redundancy by enabling both intra-layer and inter-layer parameter sharing. We also propose three shape-transformation strategies to overcome the challenges posed by mismatched parameter shapes when sharing. Our approach not only cuts the number of parameters needed for LoRA fine-tuning but, according to experimental results, also improves the performance of the fine-tuned model to some extent.

Primary Area: General Machine Learning->Transfer, Multitask and Meta-learning

Keywords: Parameter-efficient fine-tuning, parameter-sharing

Submission Number: 9541

Loading