Keywords: Large Language Models, Jailbreak, Malicious Code, Harmful Text
Abstract: Large Language Models (LLMs) demonstrate strong generalization capabilities but remain vulnerable to jailbreak attacks that induce restricted text or malicious code generation.
Recent structured jailbreaks embed adversarial intent into code-like templates and have demonstrated promising effectiveness.
However, existing approaches typically operate within a fixed template design and a single programming language, without considering language diversity or adaptive template evolution, thereby limiting the exploration of cross-language jailbreak behaviors. In this paper, we present MultiCodeAttack, a structured jailbreak framework that systematically explores and optimizes multi-language code templates. MultiCodeAttack maintains a diverse template library across programming languages, dynamically selects languages with higher attack effectiveness via a multi-armed bandit strategy, and evolves templates through semantic-preserving mutation guided by response-aware signals. Extensive experiments on 8 LLMs show that MultiCodeAttack outperforms existing jailbreak baselines, achieving 28.23\%–832.59\% higher harmful text generation. On malicious code generation across 11 LLMs, MultiCodeAttack produces up to 136.22\% more malicious outputs than the baseline methods. Our code is available at https://anonymous.4open.science/r/MultiCodeAttack/.
Paper Type: Long
Research Area: Safety and Alignment in LLMs
Research Area Keywords: Language Modeling, Safety and alignment, Red teaming
Languages Studied: English, Programming language
Submission Number: 4881
Loading