Attack on LLMs: LoRA Once, Backdoor Everywhere in the Share-and-Play Ecosystem

Hongyi Liu; Shaochen Zhong; Xintong Sun; Minghao Tian; Zirui Liu; Ruixiang Tang; Jiayi Yuan; Yu-Neng Chuang; Li Li; Soo-Hyun Choi; Rui Chen; Vipin Chaudhary; Xia Hu

Attack on LLMs: LoRA Once, Backdoor Everywhere in the Share-and-Play Ecosystem

Hongyi Liu, Shaochen Zhong, Xintong Sun, Minghao Tian, Zirui Liu, Ruixiang Tang, Jiayi Yuan, Yu-Neng Chuang, Li Li, Soo-Hyun Choi, Rui Chen, Vipin Chaudhary, Xia Hu

28 Sept 2024 (modified: 04 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LoRA, PEFT, LLM Safety, Backdoor, Backdoor Attack

TL;DR: The LoRA share-and-play ecosystem is convenient but exposes users to maliciously tampered modules. We demonstrate that such tampering can be distributed at scale with minimal effort, highlighting the need for urgent community awareness and action.

Abstract: Finetuning large language models (LLMs) with LoRA has gained significant popularity due to its simplicity and effectiveness. Often times, users may even find pluggable community-shared LoRA adapters to enhance their base models and enjoy a powerful, efficient, yet customized LLM experience. However, this convenient share-and-play ecosystem also introduces a new attack surface, where attackers can tamper with existing LoRA adapters and distribute malicious versions to the community. Despite the high-risk potential, no prior work has explored LoRA's attack surface under the share-and-play context. In this paper, we address this gap by investigating how backdoors can be injected into task-enhancing LoRA adapters and studying the mechanisms of such infection. We demonstrate that with a simple but specific recipe, a backdoor-infected LoRA can be trained once, then directly merged with multiple LoRA adapters finetuned on different tasks while retaining both its malicious and benign capabilities; which enables attackers to distribute compromised LoRAs at scale with minimal effort. Our work highlights the need for heightened security awareness in the LoRA ecosystem. Warning: the paper contains potentially offensive content generated by models.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13722

Loading