The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention

16 Jun 2024
Abstract: Using prompt-based "diversity interventions" is a typical way to improve diversity for Text-to-Image models to depict individuals with various racial or gender traits. However, this strategy might result in nonfactual demographic distribution, especially when generating real historical figures. In this work, we propose **DemOgraphic FActualIty Representation (DoFaiR)**, a benchmark to quantify the trade-off between using diversity interventions and preserving demographic factuality in Text-to-Image models. DoFaiR consists of 756 test instances, various diversity prompts, and evaluation metrics to reveal the factuality tax of diversity instructions through an automated, fact-checked, and evidence-supported evaluation pipeline. Experiments with DALLE-3 on DoFaiR unveil that diversity-oriented instructions improve the number of different gender and racial groups in generated images at the cost of accurate historical demographic distributions. To resolve this issue, we propose **Fact-Augmented Intervention** (FAI), which instructs a Large Language Model (LLM) to reflect on factual information about gender and racial compositions of generation subjects in history and incorporate it into the generation context of T2I models. By orienting model generations using the reflected historical truths, FAI remarkably preserves demographic factuality under diversity interventions, while also boosting diversity.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: Factuality, Diversity, Text-to-Image Generation, Multimodality, Generative Models
Contribution Types: Model analysis & interpretability
Languages Studied: English
