Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey

TMLR Paper6141 Authors

07 Oct 2025 (modified: 14 Oct 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Multimodal large language models (LLMs) integrate and process information from multiple modalities such as text, images, audio, and video, enabling complex tasks such as audio translation and visual question answering. While powerful, this complexity introduces novel vulnerabilities to sophisticated adversarial attacks. This survey paper provides a comprehensive overview of this rapidly expanding field, systematically categorizing attacks that range from manipulations of single modalities (e.g., perturbed images or audio) to those exploiting cross-modal interactions. We overview how these attacks exploit weaknesses in model fusion, attention mechanisms, and representation learning and provided analyses on their potential for real-world consequences.
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: NA
Assigned Action Editor: ~Sebastian_Tschiatschek1
Submission Number: 6141
Loading