Evaluation for Text-to-Image Generation from a Creativity Perspective

ACL ARR 2025 February Submission1946 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In recent years, driven by advancements in diffusion process, Text-to-Image (T2I) models have rapidly developed. However, evaluating T2I models remains a significant challenge. While previous research has thoroughly assessed the quality of generated images and image-text alignment, there has been little study on the creativity of these models. In this work, we define the creativity of T2I models based on previous definitions of machine creativity. We also propose corresponding metrics and design a method to test the reliability of the metric. Additionally, we create a fully automated pipeline that, through text vector retrieval and the text synthesis capabilities of large language models (LLMs), can convert existing image-text datasets into benchmarks needed for evaluating creativity. Finally, we conduct a series of tests and analyses on the evaluation methods for creativity and the factors influencing the creativity of the models. The code and benchmark will be released.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: multimodality, evaluation
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 1946
Loading