Augmentation-Driven Metric for Balancing Preservation and Modification in Text-Guided Image Editing

Yoonjeon Kim; Soohyun Ryu; Yeonsung Jung; Hyunkoo Lee; Joowon Kim; June Yong Yang; Jaeryong Hwang; Eunho Yang

Augmentation-Driven Metric for Balancing Preservation and Modification in Text-Guided Image Editing

Yoonjeon Kim, Soohyun Ryu, Yeonsung Jung, Hyunkoo Lee, Joowon Kim, June Yong Yang, Jaeryong Hwang, Eunho Yang

23 Sept 2024 (modified: 15 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: evaluation metric, text-guided image editing, multi-modal representation

Abstract: The development of vision-language and generative models has significantly advanced text-guided image editing, which seeks \textit{preservation} of core elements in the source image while implementing \textit{modifications} based on the target text. However, in the absence of evaluation metrics specifically tailored for text-guided image editing, existing metrics are limited in their ability to balance the consideration of both preservation and modification. Especially, our analysis reveals that CLIPScore, the most commonly used metric, tends to favor modification, resulting in inaccurate evaluations. To address this problem, we propose \texttt{AugCLIP}, a simple yet effective evaluation metric that balances preservation and modification. \texttt{AugCLIP} begins by leveraging a multi-modal large language model (MLLM) to augment detailed descriptions that encapsulate visual attributes from the source image and the target text, enabling the incorporation of richer information. Then, \texttt{AugCLIP} estimates the modification vector that transforms the source image to align with the target text with minimum alteration as a projection into the hyperplane that separates the source and target attributes. Additionally, we account for the relative importance of each attribute considering the interdependent relationships among visual attributes. Our extensive experiments on five benchmark datasets, encompassing a diverse range of editing scenarios, demonstrate that \texttt{AugCLIP} aligns remarkably well with human evaluation standards compared to existing metrics. The code for evaluation will be open-sourced to contribute to the community.

Supplementary Material: pdf

Primary Area: other topics in machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3038

Loading