RobustIT: Adapter-Centric and Attack-Agnostic Anti-Backdoor Instruction Tuning

Yuan Xun; Siyuan Liang; Xiaojun Jia; Xinwei Liu; Xiaochun Cao

RobustIT: Adapter-Centric and Attack-Agnostic Anti-Backdoor Instruction Tuning

Yuan Xun, Siyuan Liang, Xiaojun Jia, Xinwei Liu, Xiaochun Cao

16 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: robustness, security, large visual language models

Abstract: Large visual language models (LVLMs) have demonstrated excellent instruction-following capabilities, yet remain vulnerable to stealthy backdoor attacks when fine‑tuned using contaminated data. Existing defenses typically assume full access to model parameters or rely on known trigger patterns and clean validation sets, which are assumptions that fail in real‑world, efficient tuning applications where visual encoders and core LLM weights are frozen and attack priors are unavailable. Motivated by the empirical insight that LVLM adapters quickly overfit fixed triggers, we propose \textbf{R}obust \textbf{I}nstruction \textbf{T}uning~(\textbf{RobustIT}), a lightweight, \emph{attack‑agnostic} framework that tunes only adapter modules and text‑embedding layers. RobustIT combines two complementary regularizations: (1) \emph{\textbf {Input Diversity Regularization}}, which applies randomized spatial, color, and textual perturbations to disrupt fixed trigger–response mappings or consistent spurious cues; and (2) \emph{\textbf{Anomalous Activation Regularization}}, which dynamically sparsifies adapter channels exhibiting abnormally sharp activations associated with backdoor patterns. This dual strategy steers the model toward semantically grounded representations, without touching frozen cores or requiring any trigger supervision. Extensive evaluations on seven backdoor attacks across Flickr30k and MSCOCO show that RobustIT drives attack success rates to near zero with under 15\% extra training cost, while preserving or improving standard task performance of tuned models, and also highlight the critical role of efficient fine-tuning safeguards in securing real-world deployments of LVLMs.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 7373

Loading