BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation

TMLR Paper4878 Authors

17 May 2025 (modified: 27 Jun 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Parameter-efficient fine-tuning (PEFT) is a flexible and efficient method for adapting large language models (LLMs) to downstream tasks. Among these methods, weight-decomposed low-rank adaptation (DoRA) is a promising approach that decomposes weight matrices into magnitude and direction components to better mimic full fine-tuning (FT). However, DoRA's simultaneous optimization of these components makes it over-expressive, increases the risk of overfitting, and creates a coupled updating pattern that limits its learning capacity. To address these issues, we propose BiDoRA, a novel PEFT method based on a bi-level optimization framework. BiDoRA fundamentally differs from DoRA by optimizing the magnitude and direction in two separate, asynchronous loops using distinct training and validation data splits. This decoupled optimization process effectively mitigates overfitting and allows for more flexible updates that align even more closely with FT. For instance, weight decomposition analysis shows BiDoRA achieves a magnitude-direction update correlation of $-8.042$, significantly closer to the FT ideal compared to $-1.784$ for DoRA. Evaluation of BiDoRA on diverse tasks spanning natural language understanding, generation, token classification, and extremely small biomedical datasets reveals that it consistently outperforms DoRA and other leading PEFT methods. This improvement is statistically significant, as demonstrated on the GLUE benchmark where BiDoRA surpasses DoRA with a p-value of $2.4\times10^{-4}$. The code for BiDoRA is available at https://anonymous.4open.science/r/BiDoRA-5D31.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Samira_Ebrahimi_Kahou1
Submission Number: 4878
Loading