Safeguarding Multimodal Knowledge Copyright in the RAG-as-a-Service Environment

Safeguarding Multimodal Knowledge Copyright in the RAG-as-a-Service Environment

ICLR 2026 Conference Submission16979 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Watermark, VLM, Dataset Copyright Protection

TL;DR: An effective watermarking framework for protecting the copyright of multimodal knowledge, especially image knowledge, in RaaS.

Abstract: As Retrieval-Augmented Generation (RAG) evolves into service-oriented platforms (Rag-as-a-Service) with shared knowledge bases, protecting the copyright of contributed data becomes essential. Existing watermarking methods in RAG focus solely on textual knowledge, leaving image knowledge unprotected. In this work, we propose \textit{AQUA}, the first watermark framework for image knowledge protection in Multimodal RAG systems. \textit{AQUA} embeds semantic signals into synthetic images using two complementary methods: acronym-based triggers and spatial relationship cues. These techniques ensure watermark signals survive indirect watermark propagation from image retriever to textual generator, being efficient, effective and imperceptible. Experiments across diverse models and datasets show that \textit{AQUA} enables robust, stealthy, and reliable copyright tracing, filling a key gap in multimodal RAG protection.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 16979

Loading