Minimal, Local, and Robust: Embedding-Only Edits for Implicit Bias in T2I Models

Minimal, Local, and Robust: Embedding-Only Edits for Implicit Bias in T2I Models

ACL ARR 2025 May Submission1895 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Implicit assumptions and priors are often necessary in text-to-image generation tasks, especially when textual prompts lack sufficient context. However, these assumptions can sometimes reflect societal biases, low variance, or outdated concepts in the training data. We present Embedding-only Editing (EmbEdit), a method designed to efficiently edit implicit assumptions and priors in the text-to-image model without affecting unrelated objects or degrading overall performance. Given a "source" prompt (e.g., "nurse") that elicits an assumption (e.g., a female nurse) and a "destination" prompt or distribution (e.g. equal gender chance), EmbEdit only fine-tunes the word token embedding (WTE) of the target object (i.e. token "nurse'''s WTE). Our method prevents unintended effects on other objects in the model's knowledge base, as the WTEs for unrelated objects and the model weights remain unchanged. Further, our method can be applied to any text-to-image model with a text encoder. It is highly efficient, modifying only 768, 2048, and 4864 parameters for Stable Diffusion 1.4, Stable Diffusion XL, and FLUX, respectively, matching each model's WTE dimension. Additionally, changes could be easily reversed by restoring the original WTE layers. The results show that EmbEdit outperforms previous methods in various models, tasks, and editing scenarios (both single and sequential multiple edits), achieving at least a 6.01\% improvement (from 87.17\% to 93.18\%).

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: Text-to-Image, model editing, model bias

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 1895

Loading