DSH-Bench: A Difficulty- and Scenario-Aware Benchmark with Hierarchical Subject Taxonomy for Subject-Driven Text-to-Image Generation

zhenyu hu; Qing Wang; Cao Te; Kuo Liao; Longfei Lu; Liqun Liu; Shuang Li; Hang Chen; Mengge Xue; Honglin Han; Jianan Li; Chao Deng; Peng Shu

DSH-Bench: A Difficulty- and Scenario-Aware Benchmark with Hierarchical Subject Taxonomy for Subject-Driven Text-to-Image Generation

zhenyu hu, Qing Wang, Cao Te, Kuo Liao, Longfei Lu, Liqun Liu, Shuang Li, Hang Chen, Mengge Xue, Honglin Han, Jianan Li, Chao Deng, Peng Shu

12 May 2025 (modified: 29 Oct 2025)Submitted to NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Subject-driven text-to-image generation, Benchmark, Hierarchical sampling, Difficulty and scenario classification, Human-aligned evaluation, Subject Identity Consistency Score (SICS)

TL;DR: This paper introduces HDS-Bench, a comprehensive benchmark for subject-driven text-to-image generation models, to enable more rigorous and nuanced assessment of model performance.

Abstract: Significant progress has been achieved in subject-driven text-to-image (T2I) generation, which aims to synthesize new images depicting target subjects according to user instructions. However, evaluating these models remains a significant challenge. Existing benchmarks exhibit critical limitations: 1) insufficient diversity and comprehensiveness in subject images, and 2) inadequate granularity in assessing model performance across different subject difficulty levels and prompt scenarios. To address these limitations, we propose DSH-Bench, a comprehensive benchmark that enables systematic multi-perspective analysis of subject-driven T2I models through three principal innovations: 1) a hierarchical taxonomy sampling mechanism ensuring comprehensive subject representation across 58 fine-grained categories, 2) an innovative classification scheme categorizing both subject difficulty level and prompt scenario for granular model capability assessment, and 3) a novel Subject Identity Consistency Score (SICS) metric demonstrating 9.4\% higher correlation with human evaluation compared to existing measures in quantifying subject preservation. Through empirical evaluation of 15 subject-driven T2I models, DSH-Bench uncovers previously obscured limitations in current approaches while establishing concrete directions for future research.

Supplementary Material: zip

Primary Area: Evaluation (e.g., methodology, meta studies, replicability and validity, human-in-the-loop)

Submission Number: 28665

Loading