Track: Creative demo
Keywords: Hallucination, Text-to-image, Evaluation metrics, Prompt alignment, Object hallucination, Attribute hallucination, Relation hallucination, Model bias, Generative evaluation
TL;DR: We define hallucination in text-to-image diffusion models as a complementary upper-bound evaluation dimension—capturing unintended objects, attributes, or relations introduced beyond the prompt.
Abstract: In language and vision–language models, hallucination is broadly understood as content generated from a model’s prior knowledge or biases rather than from the given input. While this phenomenon has been studied in those domains, it has not been clearly framed for text-to-image (T2I) generative models. Existing evaluations mainly focus on alignment, checking whether prompt-specified elements appear, but overlook what the model generates beyond the prompt. We argue for defining hallucination in T2I as bias-driven deviations and propose a taxonomy with three categories: attribute, relation, and object hallucinations. This framing introduces an upper bound for evaluation and surfaces hidden biases, providing a foundation for richer assessment of T2I models.
Submission Number: 41
Loading