DSH-Bench: A Difficulty- and Scenario-Aware Benchmark with Hierarchical Subject Taxonomy for Subject-Driven Text-to-Image Generation

12 May 2025 (modified: 29 Oct 2025)Submitted to NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Subject-driven text-to-image generation, Benchmark, Hierarchical sampling, Difficulty and scenario classification, Human-aligned evaluation, Subject Identity Consistency Score (SICS)
TL;DR: This paper introduces HDS-Bench, a comprehensive benchmark for subject-driven text-to-image generation models, to enable more rigorous and nuanced assessment of model performance.
Abstract: Significant progress has been achieved in subject-driven text-to-image (T2I) generation, which aims to synthesize new images depicting target subjects according to user instructions. However, evaluating these models remains a significant challenge. Existing benchmarks exhibit critical limitations: 1) insufficient diversity and comprehensiveness in subject images, and 2) inadequate granularity in assessing model performance across different subject difficulty levels and prompt scenarios. To address these limitations, we propose DSH-Bench, a comprehensive benchmark that enables systematic multi-perspective analysis of subject-driven T2I models through three principal innovations: 1) a hierarchical taxonomy sampling mechanism ensuring comprehensive subject representation across 58 fine-grained categories, 2) an innovative classification scheme categorizing both subject difficulty level and prompt scenario for granular model capability assessment, and 3) a novel Subject Identity Consistency Score (SICS) metric demonstrating 9.4\% higher correlation with human evaluation compared to existing measures in quantifying subject preservation. Through empirical evaluation of 15 subject-driven T2I models, DSH-Bench uncovers previously obscured limitations in current approaches while establishing concrete directions for future research.
Supplementary Material: zip
Primary Area: Evaluation (e.g., methodology, meta studies, replicability and validity, human-in-the-loop)
Submission Number: 28665
Loading