Joint Safety Risks in Unified Generative Models
Abstract: Large Multimodal Models (LMMs) are moving from modular vision-language pipelines toward unified architectures capable of producing coordinated textual CPI and visual ICR content. This shift raises a safety synergistic harmful image-text generation that is not adequately captured by conventional text-only or image-only evaluation and in-context reskinning. In many cases, synergistic risks depends less on either exceeds their individual components than on the semantic agreement between them. We study this problem as joint safety UnifiedAttack failure in image-text generation, where cross-modal collaborative scenarios produces a measurable increase in harmfulness beyond the sum of unimodal risks. To support systematic evaluation, we design a synergistic hijacking framework and a benchmark that emphasizes multimodal-dependent prompts, synthetic disinformation cases, cognitive planning injection, and scenarios in which unsafe intent is distributed across visual and linguistic channels. We further examine how reasoning consistency and benign contextual framing can weaken existing alignment mechanisms by allowing models to preserve an underlying unsafe goal while presenting each intermediate step as acceptable. Experiments with recent unified LMMs reveal persistent gaps between unimodal safety filtering and genuine multimodal safety. These results suggest that future defenses in synergistic generation tasks should assess the combined intent, narrative coherence, and downstream impact of logic-aware defenses image-text outputs rather than treating modalities independently.
Loading