Evolving Safety Landscape of Multi-modal Large Language Models: A Survey of Emerging Threats and Safeguards

Xi Li; Shu Zhao; Xiaohan Zou; Fei Zhao; Fuxiao Liu; Yusen Zhang; Cheng Han; Yushun Dong; Jiaqi Wang

Evolving Safety Landscape of Multi-modal Large Language Models: A Survey of Emerging Threats and Safeguards

Xi Li, Shu Zhao, Xiaohan Zou, Fei Zhao, Fuxiao Liu, Yusen Zhang, Cheng Han, Yushun Dong, Jiaqi Wang

Published: 02 Mar 2026, Last Modified: 04 Mar 2026ICLR 2026 Trustworthy AIEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-modal LLM, AI safety, AI robustness

Abstract: Multi-modal large language models (MLLMs) integrate heterogeneous modalities through modality alignment and fusion, enabling stronger understanding and reasoning. However, this architectural shift reshapes the safety landscape of machine learning. Increased model complexity and cross-modal interactions give rise to novel threats, including compromised modality integration, modality misalignment, and fused safety risks, reflecting shifts in threat modeling beyond uni-modal assumptions. These shifts, in turn, impose new constraints on safety solutions not captured by existing frameworks rooted in uni-modal learning. Motivated by these challenges, this survey provides a systematic analysis of the evolving safety landscape of MLLMs. We first propose a multimodal grounded taxonomy of safety threats and analyze shifts in threat models, covering adversarial attacks, data poisoning, jailbreaks, and hallucinations. We then summarize updated safety assumptions and organize recent advances in MLLM safety strategies accordingly. Finally, we discuss open challenges and future directions to inform the development of more principled and scalable safety mechanisms for multimodal systems.

Submission Number: 252

Loading