PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality

Nanxi Li; Zhengyue Zhao; Marco Pavone; Chaowei Xiao

PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality

Nanxi Li, Zhengyue Zhao, Marco Pavone, Chaowei Xiao

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Safety Alignment; Vision Language Model; Reasoning

TL;DR: This paper introduces PRISM, a framework that teaches VLMs a structured reasoning process to identify harmful intention, which highlights the critical trade-off between safety and utility in VLM alignment.

Abstract: Safeguarding vision-language models (VLMs) is a critical challenge, as existing methods often suffer from over-defense, which harms utility, or rely on shallow alignment, failing to detect complex threats that require deep reasoning. To this end, we introduce **PRISM** (**P**rincipled **R**easoning for **I**ntegrated **S**afety in **M**ultimodality), a system2-like framework that aligns VLMs by embedding a structured, safety-aware reasoning process. Our framework consists of two key components: PRISM-CoT, a dataset that teaches safety-aware chain-of-thought reasoning, and PRISM-DPO, generated via Monte Carlo Tree Search (MCTS) to further refine this reasoning through Direct Preference Optimization to help obtain a delicate safety boundary. Comprehensive evaluations demonstrate PRISM's effectiveness, achieving remarkably low attack success rates including 0.15% on JailbreakV-28K for Qwen2-VL and 90% improvement over the previous best method on VLBreak for LLaVA-1.5. PRISM also exhibits strong robustness against adaptive attacks, significantly increasing computational costs for adversaries, and generalizes effectively to out-of-distribution challenges, reducing attack success rates to just 8.70\% on the challenging multi-image MIS benchmark. Remarkably, this robust defense is achieved while preserving model utility.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 5937

Loading