MultiCodeAttack: Iterative Jailbreak Attacking on LLMs with Multi-Code Prompt Injection

MultiCodeAttack: Iterative Jailbreak Attacking on LLMs with Multi-Code Prompt Injection

ACL ARR 2026 January Submission4881 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Jailbreak, Malicious Code, Harmful Text

Abstract: Large Language Models (LLMs) demonstrate strong generalization capabilities but remain vulnerable to jailbreak attacks that induce restricted text or malicious code generation. Recent structured jailbreaks embed adversarial intent into code-like templates and have demonstrated promising effectiveness. However, existing approaches typically operate within a fixed template design and a single programming language, without considering language diversity or adaptive template evolution, thereby limiting the exploration of cross-language jailbreak behaviors. In this paper, we present MultiCodeAttack, a structured jailbreak framework that systematically explores and optimizes multi-language code templates. MultiCodeAttack maintains a diverse template library across programming languages, dynamically selects languages with higher attack effectiveness via a multi-armed bandit strategy, and evolves templates through semantic-preserving mutation guided by response-aware signals. Extensive experiments on 8 LLMs show that MultiCodeAttack outperforms existing jailbreak baselines, achieving 28.23\%–832.59\% higher harmful text generation. On malicious code generation across 11 LLMs, MultiCodeAttack produces up to 136.22\% more malicious outputs than the baseline methods. Our code is available at https://anonymous.4open.science/r/MultiCodeAttack/.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: Language Modeling, Safety and alignment, Red teaming

Languages Studied: English, Programming language

Submission Number: 4881

Loading