DeRO: Decompose and Recompose Text Optimization for Correcting Semantics in Negation-Aware Image Generation
Keywords: text embedding, diffusion, negation, optimization, multimodal, bias, negation
Abstract: Prompt embeddings in text-to-image diffusion models have improved through the use of multiple text encoders and attention masking over text tokens, leading to stronger text–image alignment and improved controllability. However, alignment remains weak for negation-based prompts. In this paper, we analyze text embeddings and show that implicit word-level biases cause negation expressions to be ignored. We present DeRO, which optimizes the original prompt by identifying its precise semantic. The method applies SVD to prompt embeddings together with auxiliary prompts that are semantically similar, in order to obtain the corresponding semantic subspace. While projecting the text embedding onto this subspace improves alignment, naïve projection causes substantial loss of information contained in the original prompt embedding. We perform a one-time optimization that matches the token vectors of implicit biased words and negation adverbs with the projected embedding, enabling to obtain an optimal prompt embedding that is semantically aligned with the given negation prompt while preserving unrelated words. Through experiments on both object and concept negation benchmarks, we show that DeRO achieves approximately a 21% performance improvement for object erasure and a 12% improvement for concept erasure, consistently outperforming prior methods while maintaining superior computational efficiency.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: text embedding optimize, diffusion, negation, bias
Languages Studied: Multimodality,Semantics, Sentence-level Semantics, Textual Inference,SVD Analysis,Text Embedding compose
Submission Number: 3886
Loading