Keywords: Continual Unlearning, Diffusion Model, Image Generation, Machine Unlearning
TL;DR: We present the first systematic study of continual unlearning for image generation, reflecting real-world scenarios where unlearning requests arrive sequentially rather than all at once.
Abstract: Machine unlearning—the ability to remove designated concepts from a pre-trained
model—has advanced rapidly, particularly for text-to-image diffusion models.
However, existing methods typically assume that unlearning requests arrive all
at once, whereas in practice they often arrive sequentially. We present the first
systematic study of continual unlearning in text-to-image diffusion models and
show that popular unlearning methods suffer from rapid utility collapse: after only
a few requests, models forget retained knowledge and generate degraded images.
We trace this failure to cumulative parameter drift from the pre-training weights
and argue that regularization is crucial to addressing it. To this end, we study a
suite of add-on regularizers that (1) mitigate drift and (2) remain compatible with
existing unlearning methods. Beyond generic regularizers, we show that semantic
awareness is essential for preserving concepts close to the unlearning target, and
propose a gradient-projection method that constrains parameter drift orthogonal
to their subspace. This substantially improves continual unlearning performance
and is complementary to other regularizers for further gains. Taken together, our
study establishes continual unlearning as a fundamental challenge in text-to-image
generation and provides insights, baselines, and open directions for advancing safe
and accountable generative AI.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 5899
Loading