OSCR-Attack: One-Shot Character Level Attacks through Self-Optimizing Continuous Relaxation

OSCR-Attack: One-Shot Character Level Attacks through Self-Optimizing Continuous Relaxation

ACL ARR 2026 January Submission4942 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Language Modeling; Safety; Adversarial Attacks;LLM Security

Abstract: Adversarial attacks have attracted growing attention across domains, including natural language processing (NLP). Character-level adversarial attacks preserve semantics, but they have received less attention because the discrete operations they use are costly and inefficient. Challenging these beliefs, we introduce two adaptively learnable matrices that transform discrete choices into continuous representations, enabling automatic one-shot multi-position, multi-character insertion. To optimize the two learnable matrices, we propose OSCR-Attack, an end-to-end framework based on gradient-based optimization, with a conflict resolution strategy that maps the optimized continuous distributions back into discrete insertion operations. Extensive experiments on three benchmarks with three open-source large language models (LLMs) show that OSCR-Attack improves attack success rate (ASR) by up to 21.45\% points and accelerates the attack by up to 3.66 times compared to recent baselines.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: Language Modeling

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data analysis, Theory

Languages Studied: English

Submission Number: 4942

Loading