T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation

15 Sept 2025 (modified: 06 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: text-to-image generation benchmark, reasoning-informed text-to-image generation
TL;DR: A benchmarking evaluating the reasoning capability of Text-to-image models
Abstract: Text-to-image (T2I) generative models have achieved remarkable progress, demonstrating exceptional capability in synthesizing high-quality images from textual prompts. While existing research and benchmarks have extensively evaluated the ability of T2I models to follow the literal meaning of prompts, their ability to reason over prompts to uncover implicit meaning and contextual nuances remains underexplored. To bridge this gap, we introduce T2I-ReasonBench, a novel benchmark designed to explore the reasoning capabilities of T2I models. T2I-ReasonBench comprises 800 meticulously designed prompts organized into four dimensions: \textbf{(1) Idiom Interpretation}, \textbf{(2) Textual Image Design}, \textbf{(3) Entity-Reasoning}, and \textbf{(4) Scientific-Reasoning}. These dimensions challenge models to infer implicit meaning, integrate domain knowledge, and resolve contextual ambiguities. To quantify the performance, we introduce a two-stage evaluation framework: a large language model (LLM) generates prompt-specific question-criterion pairs that evaluate if the image includes the essential elements resulting from correct reasoning; a multimodal LLM (MLLM) then scores the generated image against these criteria. Experiments across 16 state-of-the-art T2I and unified multimodal models reveal critical limitations in reasoning-informed generation. Our comprehensive analysis indicates that the bottleneck of current models is in reasoning rather than generation. Our findings underscore the necessity to improve reasoning capabilities in next-generation T2I and unified multimodal systems.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 5919
Loading