Repair Aware Forgetting: An Iterative Approach to Unlearning in T2I Diffusion Models

Soumik Ghosh; Kevin Zhai; Utsav Singh; Avinash Reddy; Reeshoon Sayera; Amirhossein Safari; Souradip Chakraborty; Amrit Singh Bedi; Mubarak Shah

Repair Aware Forgetting: An Iterative Approach to Unlearning in T2I Diffusion Models

Soumik Ghosh, Kevin Zhai, Utsav Singh, Avinash Reddy, Reeshoon Sayera, Amirhossein Safari, Souradip Chakraborty, Amrit Singh Bedi, Mubarak Shah

15 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: unlearning, image generation, text-to-image, diffusion models, preference alignment

TL;DR: PURE is a repair-aware unlearning approach for T2I diffusion that uses a negative-only preference based forget loss, with an alternating forget-then-repair schedule, to erase targeted content while preserving retain accuracy and low FID.

Abstract: Text-to-image diffusion models trained on web-scale data can reproduce unsafe, private, or copyrighted content. We seek an unlearning procedure that removes such content while explicitly preserving benign performance. We formulate unlearning as repair-aware constrained optimization and introduce PURE (Preference-based UnleaRning in tExt-to-image diffusion). PURE operationalizes this with three ideas: (i) a distributional trust region around a strong reference model via a KL penalty so forgetting cannot drift on retain prompts; (ii) a diffusion-tailored, negative-only preference objective that downweights unsafe generations without paired safe examples; and (iii) an alternating schedule of short forgetting steps and lightweight repair steps on retain data that yields self-stabilizing updates and keeps image quality high. On Imagenette, PURE achieves almost perfect unlearning in 100 steps with near-perfect retain accuracy and the best FID among baselines. On I2P, it reduces NSFW generations by over 50% relative to prior state of the art, using only 50 forget samples on a single GPU. PURE is simple to implement, and both sample and compute efficient. Overall, PURE consistently outperforms ESD, FMN, and SalUn on both unlearning efficacy and fidelity, demonstrating a practical path to safe T2I diffusion without retraining or paired supervision.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 5936

Loading