CAT: Contrastive Adversarial Training for Evaluating the Robustness of Protective Perturbations in Latent Diffusion Models

Sen Peng; Mingyue Wang; Jianfei He; Jijia Yang; Xiaohua Jia

CAT: Contrastive Adversarial Training for Evaluating the Robustness of Protective Perturbations in Latent Diffusion Models

Sen Peng, Mingyue Wang, Jianfei He, Jijia Yang, Xiaohua Jia

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We reveal the role of latent representation distortion in protective perturbations and propose Contrastive Adversarial Training, an adaptive attack that exposes their robustness weaknesses.

Abstract: Latent diffusion models have recently demonstrated superior capabilities in many downstream image synthesis tasks. However, customization of latent diffusion models using unauthorized data can severely compromise the privacy and intellectual property rights of data owners. Adversarial examples as protective perturbations have been developed to defend against unauthorized data usage by introducing imperceptible noise to customization samples, preventing diffusion models from effectively learning them. In this paper, we first reveal that the primary reason adversarial examples are effective as protective perturbations in latent diffusion models is the distortion of their latent representations, as demonstrated through qualitative and quantitative experiments. We then propose the Contrastive Adversarial Training (CAT) utilizing lightweight adapters as an adaptive attack against these protection methods, highlighting their lack of robustness. Extensive experiments demonstrate that our CAT method significantly reduces the effectiveness of protective perturbations in customization, urging the community to reconsider and improve the robustness of existing protective perturbations. The code is available at \url{https://github.com/senp98/CAT}.

Lay Summary: To prevent generative AI models from customizing on unauthorized personal images, researchers add imperceptible noise to images that weakens the model’s ability to learn from them. However, we show that these “protective perturbations” are not robust enough. Our proposed CAT can still extract meaningful information from the protected images using contrastive adversarial training. This reveals serious weaknesses in current protection strategies and highlights the need for stronger safeguards to truly safeguard data in AI training.

Link To Code: https://github.com/senp98/CAT

Primary Area: Deep Learning->Robustness

Keywords: Latent Diffusion Models, Protective Perturbations, Adversarial Training

Submission Number: 15307

Loading