Inclusive on the Surface, Stereotyped Underneath: How LLMs Infer Gendered Pronouns and Justify Their Choices

ACL ARR 2026 January Submission2446 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Gender Bias, Stereotype, Hierarchical Bayesian Modeling, Pronoun Inference
Abstract: Large language models (LLMs) often infer social attributes that users do not specify, fundamentally shaping how individuals are represented in everyday writing. We audit binary pronoun inference in underspecified prompts by measuring how six LLMs assign \emph{he/him} vs.\ \emph{she/her} to an implied user and how they rationalize those choices. Prompts span three scenarios (cover letter, potluck, travel), two tones (direct, polite), and scenario-nested semantic factors, such as occupation and hobby profiles. Each trial follows a three-stage pipeline eliciting: (i) a scenario response, (ii) a constrained third-person pronoun description, and (iii) a justification for the pronoun selection. We employ hierarchical Bayesian models to analyze pronoun choice and explanation content, including a compositional model over factual, tonal, stylistic, and emotional cues, alongside Bernoulli models for stereotype surfacing and conditional avoidance. Our results show that pronoun assignments are dominated by scenario-local semantic cues and shift with stylistic phrasing, with polite prompts significantly increasing $p(\text{\emph{she}})$. Under cue-primed elicitation, model rationales mention gender stereotypes at near-ceiling rates, whereas explicit stereotype avoidance remains uncommon. Overall, we characterize a critical failure mode: even when surface pronoun rates vary across contexts, the underlying justificatory space remains heavily anchored in gender-coded associations.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: model bias/fairness evaluation, model bias/unfairness mitigation, ethical considerations in NLP applications, reflections and critiques
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: English
Submission Number: 2446
Loading