VideoShield: A Unified Framework for Multimodal Risk Detection and Control in Video Generative Models

06 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI safety, Video Generation
TL;DR: We develop a novel framework to safeguard video generative models.
Abstract: Recent progress in video generative models has enabled the creation of high-quality videos from multimodal prompts that combine text and images. While these systems offer enhanced controllability and creative potential, they also introduce new safety risks, as harmful content can emerge not only from individual modalities but also from their interaction. Existing safety methods, primarily designed for unimodal settings, struggle to handle such compositional risks. To address this challenge, we present VideoShield, a unified safeguard framework for proactively detecting and mitigating unsafe semantics in multimodal video generation. VideoShield operates in two stages: first, a contrastive detection module identifies latent safety risks by projecting fused image-text inputs into a structured concept space; second, a semantic suppression mechanism intervenes in the embedding space to remove unsafe concepts during generation. To support this framework, we introduce ConceptRisk, a large-scale, concept-centric dataset that captures a wide range of multimodal safety scenarios, including single-modality, compositional, and adversarial risks. Experiments across multiple benchmarks show that VideoShield consistently outperforms existing baselines, achieving state-of-the-art results in both risk detection and safe video generation.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 2608
Loading