Keywords: Low-Rank Adaptation (LoRA), Adversarial robustness
TL;DR: We study the effect of LoRA-based fine-tuning on adversarial robustness, showing it improves clean accuracy but can reduce robustness compared to head-only finetuning, with initialization and other parameters playing a key role.
Abstract: Low-rank adaptation (LoRA) has emerged as a prominent parameter-efficient fine-tuning (PEFT) method for large pre-trained models, enabling strong downstream performance with minimal parameter updates. While LoRA is known to outperform head-only fine-tuning in terms of clean accuracy, its impact on adversarial robustness remains largely unexplored. In this work, and to the best of our knowledge, we present the first theoretical analysis of LoRA’s adversarial robustness, comparing it to that of head-only fine-tuning. We formalize the notion of expected adversarial robustness and derive upper bounds demonstrating that, despite its superior clean performance, LoRA can be inherently less robust than head-only tuning due to the additional degrees of freedom introduced by its low-rank components. We further study the influence of LoRA’s initialization scheme and show that simple changes in the initialization distribution of the low-rank matrix can significantly affect robustness. Finally, we support our theoretical findings with extensive experiments on both vision and language benchmarks under standard adversarial attacks. Our results provide a principled understanding of the trade-offs between parameter efficiency, clean performance, and adversarial robustness in commonly used fine-tuning strategies.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 17544
Loading