S²R²: Semantic Segment Robustness Regularisation on Prompt Perturbation

S²R²: Semantic Segment Robustness Regularisation on Prompt Perturbation

ICLR 2026 Conference Submission13045 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Robustness to Prompt Perturbations, Large Language Models, PAC-Bayesian Generalisation Bound, Low-Rank Adaptation

Abstract: Large language models (LLMs) are highly sensitive to prompt perturbations, where small changes to key segments can lead to unreliable outputs. Existing robustness methods often optimise holistic objectives, overlooking semantic asymmetry and lacking certified guarantees. In this work, we propose Semantic Segment Robustness Regularisation (S$^2$R$^2$), a fine-tuning framework based on Low-Rank Adaptation (LoRA) that enforces segment-level alignment and penalises perturbation-induced attention shifts. We demonstrate that this objective is connected to a Probably Approximately Correct (PAC)-Bayesian generalisation bound, which can be formally tightened by constraining the LoRA parameter norms. Experiments across multiple models and domains show that S$^2$R$^2$ consistently reduces empirical risk, achieves significantly tighter bounds than strong baselines, and transfers effectively to out-of-distribution data.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 13045

Loading