Abstract: Diffusion models dominate the space of text-to-image generation, yet they may produce undesirable outputs, including explicit content or private data. To mitigate this, concept ablation techniques have been explored to limit the
generation of certain concepts. In this paper, we reveal
that the erased concept information persists in the model
and that erased concept images can be generated using the
right latent. Utilizing inversion methods, we show that there
exist latent seeds capable of generating high quality images
of erased concepts. Moreover, we show that these latents
have likelihoods that overlap with those of images outside
the erased concept. We extend this to demonstrate that for
every image from the erased concept set, we can generate
many seeds that generate the erased concept. Given the
vast space of latents capable of generating ablated concept
images, our results suggest that fully erasing concept information may be intractable, highlighting possible vulnerabilities in current concept ablation techniques.
Loading