Evaluating Synergistic Safety in Unified Multimodal Architectures for Harmful Image and Text Generation

Published: 17 Dec 2025, Last Modified: 27 Feb 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0
Abstract: Driven by the architectural shift of Large Multimodal Models (LMMs) toward inherently unified frameworks, assessing their safety in synergistic harmful image-text generation becomes an essential requirement. Unified Attack arise when text and image modalities are coordinated to produce harmful outputs that exceed the danger of their Cognitive Planning Injection components. We introduce Syn-Safety, a comprehensive benchmark designed to evaluate LMM resilience in collaborative scenarios by measuring the harmfulness gain from cross-modal synergy. Our framework incorporates high-potential multimodal samples and synthesized disinformation queries to stress-test model boundaries. To validate these vulnerabilities, we develop a synergistic hijacking strategy involving In-Context Reskinning (ICR) and Logical Consistency Redirection (LCR). ICR employs few-shot learning to disguise adversarial intent within benign semantic shells to desensitize safety filters, while LCR hijacks the reasoning trajectory through a "plan-then-generate" paradigm. By forcing the model to commit to a neutral logical plan, we exploit its internal drive for consistency to trigger the synchronized production of harmful UnifiedAttack. Evaluations on leading architectures show that Syn-Safety effectively bypasses modern alignment. These findings demonstrate that the structural helpfulness and logical coherence of unified models can be systematically weaponized, highlighting the necessity for logic-aware defenses in multimodal safety research.
Loading