Selectively Extracting and Injecting Visual Attributes into Text-to-Image Models

Seunghwan Choi; Jooyeol Yun; Youngdo Lee; Jaegul Choo

Selectively Extracting and Injecting Visual Attributes into Text-to-Image Models

Seunghwan Choi, Jooyeol Yun, Youngdo Lee, Jaegul Choo

20 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: text-to-image generation, diffusion models, concept learning

Abstract: Text-to-image models are increasingly utilized in design workflows, but articulating nuanced design intentions through text remains a challenge. This work proposes a method that extracts a visual attribute from a reference image and injects it directly into the generation pipeline. The method optimizes a text token to exclusively represent the target attribute using a custom training prompt and two novel embeddings: distilled embedding and residual embedding. Through this approach, a wide range of attributes can be extracted, including the shape, material, or color of an object, as well as the camera angle of the image. The method is validated on various target attributes and text prompts drawn from a newly constructed dataset. The results show that it outperforms existing approaches in selectively extracting and applying target attributes across diverse contexts. Ultimately, the proposed method enables intuitive and controllable text-to-image generation, streamlining the design process.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 22836

Loading