Auditing Who Appears to Belong: A Large-Scale Empirical Study of Bias in Deployed Text-to-Image Systems for Software Engineering
Keywords: Generative AI Bias, AI-Aware Software Engineering, Empirical Bias Analysis, Responsible AI in Education and Recruitment
Abstract: Generative image systems are increasingly embedded in software engineering artifacts such as slides, documentation, and recruiting
collateral, shaping implicit signals about who is seen to “belong.” We present a mixed-methods empirical audit of 880 images generated
by four widely used text-to-image models (GPT-4o/DALL·E 3, Llama-4/Emu, Qwen-3-235B-A22B, Stable Diffusion) using 22 demographically neutral prompts varying role, seniority, team context, geography, and language. Independent human annotations, triangulated with automated raters, capture both demographic representation (gender, race/ethnicity, age) and portrayal cues (setting, attire, props, emotion).We analyze intersectional distributions and benchmark them against occupational reference statistics. Across models, outputs consistently converge on a narrow archetype: young men dominate, women and older professionals are rare, and several racial and ethnic groups are underrepresented. Prompt variation modestly shifts racialized appearance but leaves gender imbalance largely intact, while model differences are primarily of degree rather than direction. We translate these findings into actionable implications for AI-aware software engineering practice—such as representational audits and diversity-aware defaults—arguing that evaluation of AI systems in software engineering must account for the societal signals conveyed by generated imagery alongside functional performance.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public.
Paper Type: Full-length papers (i.e. case studies, theoretical, applied research papers). 8 pages
Reroute: false
Submission Number: 8
Loading