Automatic Jailbreaking of Text-to-Image Generative AI Systems for Copyright Infringement

Minseon Kim; Hyomin Lee; Boqing Gong; Huishuai Zhang; Sung Ju Hwang

Automatic Jailbreaking of Text-to-Image Generative AI Systems for Copyright Infringement

Minseon Kim, Hyomin Lee, Boqing Gong, Huishuai Zhang, Sung Ju Hwang

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Copyright, jailbreaking, T2I model

Abstract: Recent AI systems have shown extremely powerful performance, even surpassing human performance, on various tasks such as information retrieval, language generation, and image generation based on large language models (LLMs). At the same time, there are diverse safety risks that can cause the generation of malicious contents by circumventing the alignment in LLMs, a phenomenon often referred to as jailbreaking. However, most of the previous works only focused on the text-based jailbreaking in LLMs, and the jailbreaking of the text-to-image (T2I) generation system has been relatively overlooked. In this paper, we first evaluate the safety of the commercial T2I generation systems, such as ChatGPT, Copilot, and Gemini, on copyright infringement with naive prompts. From this empirical study, we find that Copilot and Gemini block only 5\% and 11.25\% of the attacks with naive prompts, respectively, while ChatGPT blocks 96.25\% of them. Then, we further propose a stronger automated jailbreaking pipeline for T2I generation systems, which produces prompts that bypass their safety guards. Our automated jailbreaking framework leverages an LLM optimizer to generate prompts that maximize degree of violation from the generated images without any weight updates or gradient computation. Surprisingly, our simple yet effective approach successfully jailbreaks the Copilot and ChatGPT with 0.0\% and 6.25\% block rate, respectively, enabling the generation of copyrighted content 73.3\% of the time. Finally, we explore various defense strategies, such as post-generation filtering and machine unlearning techniques, but find them inadequate, highlighting the necessity of stronger defense mechanisms.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6610

Loading