Bernoulli-LoRA: A Theoretical Framework for Randomized Low-Rank Adaptation

Bernoulli-LoRA: A Theoretical Framework for Randomized Low-Rank Adaptation

ICLR 2026 Conference Submission25467 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Parameter-Efficient Fine-Tuning, Low-Rank Adaptation, Non-convex Optimization, Non-smooth Optimization, Stochastic Optimization, Variance Reduction, Adaptive Stepsizes

TL;DR: We introduce Bernoulli-LoRA, a theoretically-grounded framework for parameter-efficient fine-tuning that randomly selects which low-rank matrix to update. We provide convergence guarantees for various optimization settings and stepsize choices.

Abstract: Parameter-efficient fine-tuning (PEFT) has emerged as a crucial approach for adapting large foundational models to specific tasks, particularly as model sizes continue to grow exponentially. Among PEFT methods, Low-Rank Adaptation (LoRA) [Hu et al., 2021] stands out for its effectiveness and simplicity, expressing adaptations as a product of two low-rank matrices. While extensive empirical studies demonstrate LoRA's practical utility, the theoretical understanding of such methods remains limited. Recent work on RAC-LoRA [Malinovsky et al., 2024] took initial steps toward rigorous analysis. In this work, we introduce Bernoulli-LoRA, a novel theoretical framework that unifies and extends existing LoRA approaches. Our method introduces a probabilistic Bernoulli mechanism for selecting which matrix to update. This approach encompasses and generalizes various existing update strategies while maintaining theoretical tractability. Under standard assumptions from non-convex optimization literature, we analyze several variants of our framework: Bernoulli-LoRA-GD, Bernoulli-LoRA-SGD, Bernoulli-LoRA-PAGE, Bernoulli-LoRA-MVR, Bernoulli-LoRA-QGD, Bernoulli-LoRA-MARINA, and Bernoulli-LoRA-EF21, establishing convergence guarantees for each variant. Additionally, we extend our analysis to convex non-smooth functions, providing convergence rates for both constant and adaptive (Polyak-type) stepsizes. Through extensive experiments on various tasks, we validate our theoretical findings and demonstrate the practical efficacy of our approach. This work is a step toward developing theoretically grounded yet practically effective PEFT methods.

Primary Area: optimization

Submission Number: 25467

Loading