Abstract: Multimodal large language models (MLLMs) integrate information from multiple modalities such as text, images, audio, and video, enabling complex capabilities such as visual question answering and audio translation. While powerful, this increased expressiveness introduces new and amplified vulnerabilities to adversarial manipulation. This survey provides a comprehensive and systematic analysis of adversarial threats to MLLMs, moving beyond enumerating attack techniques to explain the underlying causes of model susceptibility. We introduce a taxonomy that organizes adversarial attacks according to attacker objectives, unifying diverse attack surfaces across modalities and deployment settings. Additionally, we also present a vulnerability-centric analysis that links integrity attacks, safety and jailbreak
failures, control and instruction hijacking, and training-time poisoning to shared architectural and representational weaknesses in multimodal systems. Together, this framework provides an explanatory foundation for understanding adversarial behavior in MLLMs and
informs the development of more robust and secure multimodal language systems.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Based on the suggestions, we have done the following key changes since last submission and based on AE's feedback:
1. Added a one-paragraph quantitative summary at the end of Section 3.1, reporting the total number of papers analyzed
2. Added a brief statement of the paper inclusion policy in the Introduction (end of Section 1), summarizing the scope and selection criteria with a reference to the full methodology in Appendix A
Supplementary Material: pdf
Assigned Action Editor: ~Sebastian_Tschiatschek1
Submission Number: 6141
Loading