Inclusive on the Surface, Stereotyped Underneath: How LLMs Infer Gendered Pronouns and Justify Their Choices
Keywords: Gender Bias, Stereotype, Hierarchical Bayesian Modeling, Pronoun Inference
Abstract: Large language models (LLMs) often infer social attributes that users do not specify, fundamentally shaping how individuals are represented in everyday writing. We audit binary pronoun inference in underspecified prompts by measuring how six LLMs assign \emph{he/him} vs.\ \emph{she/her} to an implied user and how they rationalize those choices. Prompts span three scenarios (cover letter, potluck, travel), two tones (direct, polite), and scenario-nested semantic factors, such as occupation and hobby profiles. Each trial follows a three-stage pipeline eliciting: (i) a scenario response, (ii) a constrained third-person pronoun description, and (iii) a justification for the pronoun selection. We employ hierarchical Bayesian models to analyze pronoun choice and explanation content, including a compositional model over factual, tonal, stylistic, and emotional cues, alongside Bernoulli models for stereotype surfacing and conditional avoidance. Our results show that pronoun assignments are dominated by scenario-local semantic cues and shift with stylistic phrasing, with polite prompts significantly increasing $p(\text{\emph{she}})$. Under cue-primed elicitation, model rationales mention gender stereotypes at near-ceiling rates, whereas explicit stereotype avoidance remains uncommon. Overall, we characterize a critical failure mode: even when surface pronoun rates vary across contexts, the underlying justificatory space remains heavily anchored in gender-coded associations.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: model bias/fairness evaluation, model bias/unfairness mitigation, ethical considerations in NLP applications, reflections and critiques
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: English
Submission Number: 2446
Loading