Keywords: AI Safety, Copyright-Preservation
TL;DR: We propose Safe-Sora, the first framework that integrates graphical watermarks directly into the video generation process.
Abstract: The explosive growth of generative video models has amplified the demand for
reliable copyright preservation of AI-generated content. Despite its popularity in
image synthesis, invisible generative watermarking remains largely underexplored
in video generation. To address this gap, we propose Safe-Sora, the first framework
to embed graphical watermarks directly into the video generation process. Motivated by the observation that watermarking performance is closely tied to the visual
similarity between the watermark and cover content, we introduce a hierarchical
coarse-to-fine adaptive matching mechanism. Specifically, the watermark image is
divided into patches, each assigned to the most visually similar video frame, and
further localized to the optimal spatial region for seamless embedding. To enable
spatiotemporal fusion of watermark patches across video frames, we develop a 3D
wavelet transform-enhanced Mamba architecture with a novel scanning strategy,
effectively modeling long-range dependencies during watermark embedding and
retrieval. To the best of our knowledge, this is the first attempt to apply state space
models to watermarking, opening new avenues for efficient and robust watermark
protection. Extensive experiments demonstrate that Safe-Sora achieves state-of-the-
art performance in terms of video quality, watermark fidelity, and robustness, which
is largely attributed to our proposals. Code and additional supporting materials are
provided in the supplementary.
Supplementary Material: zip
Primary Area: Social and economic aspects of machine learning (e.g., fairness, interpretability, human-AI interaction, privacy, safety, strategic behavior)
Submission Number: 14198
Loading