Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models

Salma Abdel Magid; Weiwei Pan; Simon Warchol; Grace Guo; Junsik Kim; Mahia Rahman; Hanspeter Pfister

Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models

Salma Abdel Magid, Weiwei Pan, Simon Warchol, Grace Guo, Junsik Kim, Mahia Rahman, Hanspeter Pfister

Published: 06 May 2025, Last Modified: 06 May 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Text-to-image (T2I) models are increasingly used in impactful real-life applications. As such, there is a growing need to audit these models to ensure that they generate desirable, task-appropriate images. However, systematically inspecting the associations between prompts and generated content in a human-understandable way remains challenging. To address this, we propose \emph{Concept2Concept}, a framework where we characterize conditional distributions of vision language models using interpretable concepts and metrics that can be defined in terms of these concepts. This characterization allows us to use our framework to audit models and prompt-datasets. To demonstrate, we investigate several case studies of conditional distributions of prompts, such as user-defined distributions or empirical, real-world distributions. Lastly, we implement Concept2Concept as an open-source interactive visualization tool to facilitate use by non-technical end-users. A demo is available at https://tinyurl.com/Concept2ConceptDemo. Warning: This paper contains discussions of harmful content, including CSAM and NSFW material, which may be disturbing to some readers.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: Camera ready version (de-anonymized)

Code: https://tinyurl.com/Concept2ConceptDemo

Assigned Action Editor: ~Magda_Gregorova2

Submission Number: 4154

Loading