BiLoRA: A Bi-level Optimization Framework for Low-rank Adapters

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Parameter-efficient fine-tuning, Natural language processing, Low-rank adaptation
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose BiLoRA, a bi-level optimization framework for overfitting-alleviating low-rank adaptation of large pretrained models.
Abstract: Low-rank adaptations (LoRA) are widely employed for fine-tuning large-scale pretrained models in downstream tasks, by learning low-rank incremental matrices. LoRA and its variants such as AdaLoRA train an entire low-rank incremental matrix on a single training dataset, which often leads to overfitting to training data and inferior generalization on test data. To address this problem, we propose a bi-level optimization (BLO) based method for alleviating overfitting. Our method parameterizes a low-rank incremental matrix in a pseudo singular value decomposition form, and separates the training of pseudo singular vectors and values onto different data subsets in different optimization problems. This separation alleviates the risk of overfitting to a single dataset and improves generalization on other data. Specifically, in the lower level of our BLO formulation, we train the pseudo singular vectors on a subset of the training data. In the upper level, we learn the pseudo singular values on the other subset of the training data. The two levels of optimization problems are mutually dependent on each other and solved jointly. On ten datasets from natural language understanding and generation tasks and on various popular large pretrained models, our method achieves significantly better performance than LoRA, AdaLoRA, and other fine-tuning baseline methods with similar amounts of trainable parameters.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1816
Loading