SynergyRisk: A Benchmark and Hijacking Framework for Probing Cross-Modal Safety Failures in Unified Large Multimodal Models

Yuxuan Yuan

Published: 29 Jan 2026, Last Modified: 27 Feb 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: With the rapid evolution of unified Large Multimodal Models (LMMs) that jointly reason over visual and textual inputs, assessing their robustness against coordinated harmful generation has become increasingly important. Beyond isolated unimodal misuse, emergent UnifiedAttack arise when image and text modalities interact to amplify harmful intent through structured cross-modal cooperation. In this work, we present SynergyRisk, a comprehensive evaluation benchmark that systematically measures safety degradation under synergistic image-text generation scenarios. Our benchmark emphasizes cross-modal amplification, quantifying the incremental harmfulness introduced by multimodal coordination compared to single-modality baselines. It includes carefully curated multimodal-sensitive prompts and a newly constructed collection of compositional misinformation cases designed to stress unified architectures. To further examine alignment weaknesses, we introduce a targeted cross-modal hijacking framework composed of Contextual Intent Camouflaging (CIC) and Cognitive Planning Injection (CPI). CIC leverages few-shot contextualization to embed adversarial objectives within ostensibly benign multimodal narratives, thereby attenuating safety-triggered responses. CPI manipulates the model’s reasoning trajectory through a staged plan-and-generation protocol, exploiting consistency pressures inherent in unified reasoning pipelines to synchronize harmful visual and textual outputs. Extensive experiments across state-of-the-art unified LMMs demonstrate that synergistic attacks substantially outperform UnifiedAttack baselines in bypassing alignment safeguards. Our results indicate that the very properties enabling coherent multimodal reasoning can be systematically leveraged to induce coordinated harmful outputs. These findings underscore the necessity of synergy-aware and reasoning-level defense mechanisms for next-generation multimodal systems.