Keywords: text-to-image diffusion model, continuous concept removal, responsible AI
Abstract: Text-to-image diffusion models have shown an impressive ability to generate high-quality images from input textual descriptions/prompts. However, concerns have been raised about the potential for these models to create content that infringes on copyrights or depicts disturbing subject matter.
Removing specific concepts from these models is a promising solution to this issue. However, existing methods for concept removal do not work well in practical but challenging scenarios where concepts need to be continuously removed. Specifically, these methods lead to poor alignment between the text prompts and the generated image after the continuous removal process.
To address this issue, we propose a novel concept removal approach called CCRT that includes a designed knowledge distillation paradigm.
CCRT constrains the text-image alignment behavior during the continuous concept removal process by using a set of text prompts.
These prompts are generated through our genetic algorithm, which employs a designed fuzzing strategy.
To evaluate the effectiveness of CCRT, we conduct extensive experiments involving the removal of various concepts, algorithmic metrics, and human studies.
The results demonstrate that CCRT can effectively remove the targeted concepts from the model in a continuous manner while maintaining the high image generation quality (e.g., text-image alignment).
The code of CCRT is available at https://github.com/wssun/CCRT.
Supplementary Material: zip
Primary Area: Social and economic aspects of machine learning (e.g., fairness, interpretability, human-AI interaction, privacy, safety, strategic behavior)
Submission Number: 5869
Loading