Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models

15 Sept 2025 (modified: 27 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Vision-Language Models, Jailbreak attacks, Black-box attack methods
TL;DR: This paper introduces CAMO, a novel black-box jailbreak attack that exploits LVLMs' cross-modal reasoning to stealthily bypass safety mechanisms with high efficiency and strong transferability.
Abstract: Large Vision-Language Models (LVLMs) demonstrate exceptional performance across multimodal tasks, yet remain vulnerable to jailbreak attacks that bypass safety mechanisms. Existing jailbreak methods suffer from two critical limitations: insufficient stealth against input-level defense filters and high computational costs from lengthy prompts or iterative procedures. In this work, we present Cross-modal Adversarial Multimodal Obfuscation (CAMO), a black-box jailbreak framework that decomposes harmful instructions into benign-looking textual and visual clues. CAMO leverages LVLMs' cross-modal reasoning to reconstruct attack intent while each component appears harmless in isolation, evading defense filters. Our compositional obfuscation design achieves superior efficiency, using only 12.6\% of tokens required by existing methods while achieving high attack success rates of 81.82\% on GPT-4.1-nano and 93.94\% on DeepSeek-R1. CAMO bypasses multiple defense mechanisms with 100\% evasion rate, demonstrating effectiveness across both open-source and closed-source models. This work exposes critical vulnerabilities in current multimodal safety protocols and underscores the need for more sophisticated defense strategies.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 5630
Loading