Keywords: Safety
Abstract: Concept Erasure, which aims to prevent pretrained text-to-image models from
generating content associated with semantic-harmful concepts (i.e., target con
cepts), is getting increased attention. State-of-the-art methods formulate this task
as an optimization problem: they align all target concepts with semantic-harmless
anchor concepts, and apply closed-form solutions to update the model accord
ingly. While these closed-form methods are efficient, we argue that existing meth
ods have two overlooked limitations: 1) They often result in incomplete erasure
due to “non-zero alignment residual”, especially when text prompts are relatively
complex. 2) They may suffer from generation quality degradation as they al
ways concentrate parameter updates in a few deep layers. To address these issues,
we propose a novel closed-form method ErasePro: it is designed for more com
plete concept erasure and better preserving overall generative quality. Specifically,
ErasePro first introduces a strict zero-residual constraint into the optimization ob
jective, ensuring perfect alignment between target and anchor concept features
and enabling more complete erasure. Secondly, it employs a progressive, layer
wise update strategy that gradually transfers target concept features to those of the
anchor concept from shallow to deep layers. As the depth increases, the required
parameter changes diminish, thereby reducing deviations in sensitive deep layers
and preserving generative quality. Empirical results across different concept era
sure tasks (including instance, art style, and nudity erasure) have demonstrated the
effectiveness of our ErasePro.
Supplementary Material: pdf
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 9338
Loading