Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey

Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey

TMLR Paper6141 Authors

07 Oct 2025 (modified: 27 Feb 2026)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Multimodal large language models (MLLMs) integrate information from multiple modalities such as text, images, audio, and video, enabling complex capabilities such as visual question answering and audio translation. While powerful, this increased expressiveness introduces new and amplified vulnerabilities to adversarial manipulation. This survey provides a comprehensive and systematic analysis of adversarial threats to MLLMs, moving beyond enumerating attack techniques to explain the underlying causes of model susceptibility. We introduce a taxonomy that organizes adversarial attacks according to attacker objectives, unifying diverse attack surfaces across modalities and deployment settings. Additionally, we also present a vulnerability-centric analysis that links integrity attacks, safety and jailbreak failures, control and instruction hijacking, and training-time poisoning to shared architectural and representational weaknesses in multimodal systems. Together, this framework provides an explanatory foundation for understanding adversarial behavior in MLLMs and informs the development of more robust and secure multimodal language systems.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: Based on the suggestions, we have done the following key changes: 1. We refined the taxonomy to organize attacks by primary attacker objective, and expanded coverage with additional representative literature. 2. We strengthened the motivation for the chosen taxonomy structure in Section 3.1, clarifying its advantages over modality- or task-based alternatives. 3. We added Tables 2–5, explicitly mapping attack categories to the vulnerability taxonomy derived in Section 4. 4. We added a dedicated Defense Mechanisms section (Section 5), structured around the four attack families proposed in the taxonomy. 5. We revised the earlier two-dimensional taxonomy matrix (Table 1) to use attacker knowledge level and attacker objective as orthogonal axes, replacing the earlier attack-vector–based organization. 6. We introduced an explicit Limitations section (Section 6) to clearly consolidate scope and methodological constraints. 7. We added a Broader Impact statement (Section 7) to discuss societal and deployment implications. 8. We added an appendix table providing empirical metadata for all papers included in the taxonomy, including attacker knowledge level, target modalities, and reported attack success rates (when available), enabling transparent inspection of evaluation trends. 9. We tightened the inclusion and exclusion criteria in the appendix to ensure consistent empirical grounding. 10. We revised the Conclusion to limit claims strictly to what is empirically supported by the surveyed literature. 11. We expanded the comparison with prior surveys, clarifying how our objective- and vulnerability-driven perspective complements existing modality-focused reviews.

Assigned Action Editor: ~Sebastian_Tschiatschek1

Submission Number: 6141

Loading