Allusive Adversarial Examples via Latent Space in Multimodal Large Language Models

ICLR 2026 Conference Submission9994 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal models; LLM; LLM Security
Abstract: Multimodal large language models (MLLMs) generate text by conditioning on heterogeneous inputs such as images and text. We present allusive adversarial examples, a new class of attacks that imperceptibly encode target instructions into non-textual modalities. Unlike prior adversarial examples, these attacks manipulate model outputs without altering the textual instruction. To construct them, we introduce a practical learning framework that leverages cross-modal alignment and exploits the shared latent space of MLLMs. Empirical evaluation on LLaVA, InternVL, Qwen-VL, and Gemma demonstrates that our method produces efficient and effective adversarial examples, uncovering a critical security risk in multimodal systems.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 9994
Loading