Abstract: Text-to-image diffusion models often encode correlations between demographic prompts and non-demographic attributes, some of which may be expected (e.g., gray hair with older age) while others may raise fairness concerns (e.g., cultural markers appearing only for certain ethnicities). Existing analyses of such correlations have been largely qualitative. In this work, we present a counterfactual-style diagnostic framework for stress-testing diffusion models. Inspired by stress-testing approaches (e.g., Veitch et al.), our method uses image-conditioned generation to approximately preserve facial features while systematically varying demographic variables in prompts (gender, ethnicity, age). This setup enables controlled observation of how non-demographic attributes (e.g., facial hair, accessories, hairstyles) shift under demographic changes. We introduce Counterfactual-style Invariance (CIV), along with positive and negative variance metrics (PCV, NCV), to quantify attribute stability and directional changes. Applying this framework across multiple text-to-image models reveals pervasive, prompt-dependent entanglements—for example, bushy eyebrows co-occur in 62.5\% of generations with “Middle Eastern” prompts, and Black hair is amplified in 64.8\% of “East Asian” generations. These findings show that generative models can amplify or introduce associations between the demographic variables and observed attributes. This highlights the need for systematic diagnostic evaluations to better understand and mitigate fairness risks in text-to-image generation.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Marcus_A_Brubaker1
Submission Number: 6806
Loading