Jailbreak Connectivity: Towards Diverse, Transferable, and Universal MLLM Jailbreak

Jailbreak Connectivity: Towards Diverse, Transferable, and Universal MLLM Jailbreak

ICLR 2026 Conference Submission19677 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Jailbreak Attacks, Multimodal Large Language Models (MLLMs), Image-based Jailbreak, Transferable Attacks, Universal Jailbreak.

Abstract: While multimodal large language models (MLLMs) have shown immense potential, their susceptibility to security threats, particularly through the visual modality, poses serious concerns for real-world deployment. Existing jailbreak studies, which successfully induce harmful responses, suffer from three key limitations: a lack of diversity, poor transferability across different models, and ineffectiveness against multiple targets simultaneously. To address these challenges, we introduce the Jailbreak Connectivity (JC) framework. JC framework includes three novel components. First, it generates a diverse range of jailbreak attacks by constructing a continuous path in the image space that connects two jailbreak images. Second, it improves transferability by integrating two types of surrogate classifiers, Safety Classifiers and Jailbreak Success Predictors, to guide the optimization process. Third, JC enables universal jailbreak attacks by modifying the attack objective to elicit any harmful content rather than being tied to a specific harmful question, thereby inducing the target MLLM to answer a broad range of harmful queries. Our experiments on the SafetyBench dataset show that JC achieves an average attack success rate (ASR) of \emph{79.62\%}, representing a substantial \emph{36.24\% increase} over the best-performing state-of-the-art method. In addition, JC obtains the lowest perplexity in 12 out of 13 scenarios, indicating that the generated harmful responses are more fluent and natural. This work offers a promising approach for generating diverse, transferable, and universal jailbreak attacks, highlighting critical security vulnerabilities in current MLLMs. \textcolor{red}{\emph{Warning: This paper contains data, prompts, and model outputs that are offensive in nature.}}

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 19677

Loading