A Dataset and Benchmark for Copyright Protection from Text-to-Image Diffusion Models

Rui Ma; Qiang Zhou; Bangjun Xiao; Daquan Zhou; Xiuyu Li; Aishani Singh; Yi Qu; Kurt Keutzer; Xiaodong Xie; Jingtong Hu; Zhen Dong; Shanghang Zhang

A Dataset and Benchmark for Copyright Protection from Text-to-Image Diffusion Models

Rui Ma, Qiang Zhou, Bangjun Xiao, Daquan Zhou, Xiuyu Li, Aishani Singh, Yi Qu, Kurt Keutzer, Xiaodong Xie, Jingtong Hu, Zhen Dong, Shanghang Zhang

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Copyright Protection, Stable diffusion

Abstract: Copyright is a legal right that grants creators the exclusive authority to reproduce, distribute, and profit from their creative works. However, the recent advancements in text-to-image generation techniques have posed significant challenges to copyright protection, as these methods have facilitated the learning of unauthorized content, artistic creations, and portraits, which are subsequently utilized to generate and disseminate uncontrolled content. Especially, the use of stable diffusion, an emerging model for text-to-image generation, poses an increased risk of unauthorized copyright infringement and distribution. And there is currently a lack of systematic studies evaluating the potential correlation between content generated by stable diffusion and copyright infringement. Conducting such studies faces several challenges, including i) the inherent ambiguity surrounding copyright infringement in text-to-image models, ii) the absence of a large-scale inference dataset, and iii) the lack of standardized metrics for defining copyright infringement. This work provides the first large-scale standardized dataset and benchmark on copyright protection. Specifically, we propose a pipeline to coordinate CLIP, ChatGPT, and diffusion models to generate a dataset that contains anchor images, corresponding prompts, and images generated by text-to-image models, reflecting the potential abuses of copyright. Furthermore, we propose a suite of evaluation metrics to judge the effectiveness of copyright protection methods. The proposed dataset, benchmark library, and evaluation metrics will be open-sourced to facilitate future research and application.

Supplementary Material: pdf

Primary Area: datasets and benchmarks

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6995

Loading