Exploring GPT-4 Vision for Text-to-Image Synthesis Evaluation

Published: 19 Mar 2024, Last Modified: 03 May 2024Tiny Papers @ ICLR 2024 PresentEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Text-to-Image Synthesis Evaluation, GPT-4 Vision, Prompt Engineering
Abstract: This paper addresses the critical need for more accurate evaluation methods in text-to-image synthesis. While the standard CLIPScore metric can reflect text-image alignment to some extent, it often falls short in consistency with human perception. We propose the use of GPT-4 Vision as a novel evaluative standard, capable of interpreting text and image nuances akin to human cognition. Our study focuses on the pivotal role of prompt design in maximizing GPT-4 Vision's effectiveness, presenting a systematic discussion for prompt construction. Empirical evaluations demonstrate that GPT-4 Vision, augmented by our prompt-design strategy, aligns more closely with human judgment.
Submission Number: 66
Loading