From Discrete to Continuous: One-Shot Multi-Position Character-Level Adversarial Attacks

19 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: OSCR-Attack: One-Shot Character Level Attacks through Self-Optimizing Continuous Relaxation
Abstract: Adversarial attacks have attracted growing attention across domains, including natural language processing (NLP). Character-level adversarial attacks preserve semantics, but they have received less attention because the discrete operations they use are costly and inefficient. Challenging these beliefs, we introduce two adaptively learnable matrices that transform discrete choices into continuous representations, enabling automatic one-shot multi-position, multi-character insertion. To optimise the two learnable matrices, we propose OSCR-Attack, an end-to-end framework based on gradient-based optimisation, with a conflict resolution strategy mapping the optimised continuous distributions back into discrete insertion operations. Extensive experiments on three benchmarks with three open-source LLMs show that OSCR-Attack improves attack success rate (ASR) by up to 16\% points and accelerates the attack by up to 6 times compared to recent baselines.
Supplementary Material: pdf
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 18592
Loading