Abstract: Multimodal large language models (MLLMs) and text-to-image (T2I) systems are pervasive, yet how stereotypes propagate across pipelines remains unclear. We present a model-agnostic auditing framework that evaluates joint stereotype formation across T2I and MLLM pipelines using four T2I models and five MLLMs. We use seven nationalities (American, Indian, Iranian, Japanese, Mexican, Nigerian, Russian) along with five gender terms (man, woman, boy, girl, person) to create a set of images, which is then evaluated across different attributes and traits. For the evaluation, we also generate a set of images as a neutral baseline along with distance and radar plots. Embeddings through t-SNE and distance plots reveal tight nationality clusters and a drift of gender neutral prompts toward “man”. We further introduce five metrics: TDS and WTD to quantify trait shifts; SDI and OM for label dominance/overlap; and MCS for corruption-induced instability. TDS and WTD show minimal deviation for American and maximal for Nigerian groups, indicating that physical traits can be nationality-specific. Frequency plots, treemaps, along with SDI and OM, indicate that there is an over-reliance on a few words. MCS shows that mild degradations yield 15-45% meaningful label changes and accuracy drops, indicating that noise affects predictions. Our framework offers actionable and reproducible tools for auditing stereotype risk in multimodal AI.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Farzan_Farnia1
Submission Number: 6576
Loading