Holistic Evaluation of Text-to-Image Models

Tony Lee; Michihiro Yasunaga; Chenlin Meng; Yifan Mai; Joon Sung Park; Agrim Gupta; Yunzhi Zhang; Deepak Narayanan; Hannah Benita Teufel; Marco Bellagente; Minguk Kang; Taesung Park; Jure Leskovec; Jun-Yan Zhu; Li Fei-Fei; Jiajun Wu; Stefano Ermon; Percy Liang

Holistic Evaluation of Text-to-Image Models

Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Benita Teufel, Marco Bellagente, Minguk Kang, Taesung Park, Jure Leskovec, Jun-Yan Zhu, Li Fei-Fei, Jiajun Wu, Stefano Ermon, Percy Liang

Published: 26 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 Datasets and Benchmarks SpotlightEveryoneRevisionsBibTeX

Keywords: text-to-image, image generation, multimodal, holistic evaluation, benchmarking, human evaluation

TL;DR: We present a holistic evaluation framework for text-to-image generation models, assessing their performance across 12 important aspects in real-world deployment. We release all the generated images and evaluation results.

Abstract: The stunning qualitative improvement of text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on image-text alignment and image quality, we identify 12 aspects, including text-image alignment, image quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency. We curate 62 scenarios encompassing these aspects and evaluate 26 state-of-the-art text-to-image models on this benchmark. Our results reveal that no single model excels in all aspects, with different models demonstrating different strengths. We release the generated images and human evaluation results for full transparency at https://crfm.stanford.edu/heim/latest and the code at https://github.com/stanford-crfm/helm, which is integrated with the HELM codebase

Supplementary Material: pdf

Submission Number: 658

Loading