Text-to-Unlearn: Robust Concept Removal in GANs via Text Prompts

Text-to-Unlearn: Robust Concept Removal in GANs via Text Prompts

ICLR 2026 Conference Submission22607 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Machine Unlearning, Cross-Modal Unlearning

TL;DR: We propose a method to unlearn concepts from a GAN using only text prompts.

Abstract: State-of-the-art generative models exhibit powerful image-generation capabilities, raising ethical and legal challenges for service providers. Consequently, Content Removal Techniques (CRTs) have emerged to control outputs without requiring full retraining. However, the problem of unlearning in Generative Adversarial Networks (GANs) remains largely unexplored. We propose Text-to-Unlearn, a novel framework that selectively unlearns concepts from pre-trained GANs using only text prompts, enabling feature and identity unlearning, as well as fine-grained tasks such as expression and multi-attribute removal in models trained on human faces. Our approach leverages natural language descriptions to guide unlearning without additional datasets or supervised finetuning, offering a scalable solution. To evaluate the effectiveness of our method, we introduce an automated unlearning assessment method using state-of-the-art image-text alignment metrics and propose a new metric: degree of unlearning. Additionally, we assess robustness by introducing a prompt boundary attack to subvert unlearning. Our results demonstrate that Text-to-Unlearn achieves robust unlearning, resisting adversarial attempts to recover erased concepts while preserving model utility. To our knowledge, this is the first cross-modal unlearning framework for GANs, advancing the management of generative model behavior.

Primary Area: generative models

Submission Number: 22607

Loading