Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: text-to-image generation, multimodality, wordnet, hypernymy, lexical semantics
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We design two metrics for text-to-image generation which are based on hypernymy in the WordNet hierarchy and measure model knowledge of broad concepts.
Abstract: Text-to-image synthesis has recently attracted widespread attention of the community due to rapidly improving generation quality and numerous practical applications. However, little is known about the language understanding capabilities of text-to-image models, making it difficult to reason about prompt formulations that the model would understand well. In this work, we measure the capability of popular text-to-image models to understand *hypernymy*, or the ``is-a" relation between words. To this end, we design two automatic metrics based on the WordNet semantic hierarchy and existing image classifiers pretrained on ImageNet. These metrics both enable quantitative comparison of linguistic capabilities for text-to-image models and offer a way of finding qualitative differences, such as words that are unknown to models and thus are difficult for them to draw. We comprehensively evaluate our metrics on various popular text-to-image generation models, including GLIDE, Latent Diffusion, and Stable Diffusion, which allows a better understanding of their shortcomings for downstream applications.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8163
Loading