Evaluating Dataset Watermarking for Fine-tuning Traceability of Customized Diffusion Models: A Comprehensive Benchmark and Removal Approach

Xincheng Wang; Hanchi Sun; Wenjun Sun; Kejun Xue; Wangqiu Zhou; Jianbo Zhang; Wei Sun; Jun Jia; Zhijun Fang; Guangtao Zhai

Evaluating Dataset Watermarking for Fine-tuning Traceability of Customized Diffusion Models: A Comprehensive Benchmark and Removal Approach

Xincheng Wang, Hanchi Sun, Wenjun Sun, Kejun Xue, Wangqiu Zhou, Jianbo Zhang, Wei Sun, Jun Jia, Zhijun Fang, Guangtao Zhai

20 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Dataset Watermarking; Diffusion Model; Copyright Protection

TL;DR: This paper first establishes a generalized threat model and subsequently introduces a comprehensive framework for evaluating dataset watermarking methods, comprising three dimensions: Universality, Transmissibility, and Robustness.

Abstract: Recently, numerous fine-tuning techniques for diffusion models have been developed, enabling diffusion models to generate content that closely resembles a specific image set, such as specific facial identities and artistic styles. However, this advancement also poses potential security risks. The primary risk comes from copyright violations due to using public domain images without authorization to fine-tune diffusion models. Furthermore, if such models generate harmful content linked to the source images, tracing the origin of the fine-tuning data is crucial to clarify responsibility. To achieve fine-tuning traceability of customized diffusion models, dataset watermarking for diffusion model has been proposed, involving embedding imperceptible watermarks into images that require traceability. Notably, even after using the watermarked images to fine-tune diffusion models, the watermarks remain detectable in the generated outputs. However, existing dataset watermarking approaches lack a unified framework for performance evaluation, thereby limiting their effectiveness in practical scenarios. To address this gap, this paper first establishes a generalized threat model and subsequently introduces a comprehensive framework for evaluating dataset watermarking methods, comprising three dimensions: Universality, Transmissibility, and Robustness. Our evaluation results demonstrate that existing methods exhibit universality across diverse fine-tuning approaches and tasks, as well as transmissibility even when only a small proportion of watermarked images is used. In terms of robustness, existing methods show good performance against common image proces sing operations, but this does not match real-world threat scenarios. To address this issue, this paper proposes a practical watermark removal method that can completely remove dataset watermarks without affecting fine-tuning, revealing their vulnerabilities and pointing to a new challenge for future research.

Primary Area: datasets and benchmarks

Submission Number: 25222

Loading