Flat-LoRA: Low-Rank Adaptation over a Flat Loss Landscape

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose Flat-LoRA that aims to efficiently optimize the sharpness of the loss landscape in the full parameter space for low-rank adaptation.
Abstract: Fine-tuning large-scale pre-trained models is prohibitively expensive in terms of computation and memory costs. Low-Rank Adaptation (LoRA), a popular Parameter-Efficient Fine-Tuning (PEFT) method, offers an efficient solution by optimizing only low-rank matrices. Despite recent progress in improving LoRA's performance, the relationship between the LoRA optimization space and the full parameter space is often overlooked. A solution that appears flat in the loss landscape of the LoRA space may still exhibit sharp directions in the full parameter space, potentially compromising generalization. We introduce Flat-LoRA, which aims to identify a low-rank adaptation situated in a flat region of the full parameter space. Instead of adopting the well-established sharpness-aware minimization approach, which incurs significant computation and memory overheads, we employ a Bayesian expectation loss objective to preserve training efficiency. Further, we design a refined strategy for generating random perturbations to enhance performance and carefully manage memory overhead using random seeds. Experiments across diverse tasks—including mathematical reasoning, coding abilities, dialogue generation, instruction following, and text-to-image generation—demonstrate that Flat-LoRA improves both in-domain and out-of-domain generalization. Code is available at https://github.com/nblt/Flat-LoRA.
Lay Summary: A popular approach for fine-tuning large-scale pre-trained models on specific downstream tasks is to optimize only a small subset of parameters, known as Parameter-Efficient Fine-Tuning (PEFT) methods. These methods, such as those using low-rank matrices, are widely adopted for their efficiency. However, from the perspective of the loss landscape, there is a notable inconsistency between PEFT methods and full fine-tuning: a solution that is flat (which is desirable for generalization) in the reduced parameter space is not necessarily flat in the full parameter space. Traditionally, to encourage flatness across the entire parameter space, Sharpness-Aware Minimization (SAM) is employed. However, SAM doubles the training time and requires an additional copy of the model weights to compute the sharpness direction, which is often impractical due to its computational and memory overhead. Our goal is to achieve flatness in the full parameter space for PEFT methods, while maintaining efficiency in both time and memory. To address this, we propose a simple approach that introduces carefully designed random perturbations. These perturbations can be efficiently generated and stored using random seeds, ensuring that the method remains lightweight. Our approach can be easily integrated into existing efficient fine-tuning methods, enhancing generalization performance with minimal additional cost.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/nblt/Flat-LoRA
Primary Area: Optimization
Keywords: Low-rank adaptation, generalization, efficient training
Submission Number: 9640
Loading