AMGE: Adaptive Modality Gap Exploitation for Adversarial Attacks on Vision-Language Models

TMLR Paper6329 Authors

28 Oct 2025 (modified: 29 Oct 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Multimodal large language models unify visual perception with natural language understanding, yet remain vulnerable to adversarial manipulations. Existing jailbreak attacks exploit vision-text vulnerabilities through pixel-space perturbations and prompt optimization, overlooking a fundamental weakness: the modality gap—the geometric separation between image and text embeddings. We present Adaptive Modality Gap Exploitation (AMGE), operating within the embedding manifold through gap-aware perturbation optimization and cross-attention-mediated gradient flow. Our framework characterizes the modality gap via empirical directional bias estimation, formulates attacks as geometric exploitation where gradient updates align with gap vectors, and employs momentum-based ensemble aggregation for universal transferability across queries and architectures. Evaluation across four multimodal LLMs (LLaVA-1.5-7B/13B, Qwen-VL, Qwen2-VL) demonstrates 90.2% attack success rate with 79.1% transferability, requiring only 127 queries—3× fewer than competing methods—while maintaining 87.5% semantic preservation. AMGE sustains 62.3% effectiveness against five defenses, outperforming existing attacks by 23.7%. This work establishes embedding-space geometric exploitation as a principled paradigm for exposing vulnerabilities in multimodal alignment architectures.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Jinghui_Chen1
Submission Number: 6329
Loading