INPO: Image-based Negative Preference Optimization for Concept Erasure in Text-to-Image Diffusion Models

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Model, Concept Erasure
Abstract: Text-to-image diffusion models have achieved remarkable generative performance, yet they are susceptible to memorizing and reproducing undesirable concepts, such as NSFW content or copyrighted material. While concept erasure has emerged as a promising approach to remove undesirable concepts from pre-trained models, existing methods still suffer from prompt-dependence, architecture-dependence, and unstable training dynamics, which limit their effectiveness and generalization. In this work, we propose Image-based Negative Preference Optimization (INPO), a novel model-agnostic framework for concept erasure that unifies joint image–text supervision under a principled preference optimization paradigm. By formulating the target concept as a negative preference, INPO inherits the stable optimization dynamics of Negative Preference Optimization (NPO), thereby mitigating the instability of prior gradient-ascent-based methods. To achieve precise and controllable erasure, INPO further incorporates a concept mask for localized suppression and an adaptive negative scaling strategy that dynamically modulates optimization strength according to erasure progress. Extensive experiments on the latest FLUX model demonstrate that INPO achieves precise and consistent erasure across a variety of tasks, including object, IP, style and NSFW content, while preserving the model’s overall generative capabilities, highlighting the robustness, reliability and practical applicability of INPO for safe and controllable image generation.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 24245
Loading