Who's Asking? Investigating Bias Through the Lens of Disability-Framed Queries in LLMs

Published: 28 Aug 2025, Last Modified: 28 Aug 2025CV4A11yEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, disabilities, disability cues, demographic inference, ableist bias, accessibility, intersectional fairness, bias mitigation
TL;DR: When a query hints at disability, SOTA LLMs seldom abstain and instead tilt their demographic guesses toward ableist stereotypes, underscoring the need for disability-aware intersectional audits and mitigation.
Abstract: Large Language Models (LLMs) routinely infer users’ demographic traits from phrasing alone, which can result in biased responses, even when no explicit demographic information is provided. The role of disability cues in shaping these inferences remains largely uncharted. Thus, we present the first systematic audit of disability‑conditioned demographic bias across eight state‑of‑the‑art instruction‑tuned LLMs ranging from 3B to 72B parameters. Using a balanced template corpus that pairs nine disability categories with six real‑world business domains, we prompt each model to predict five demographic attributes - gender, socioeconomic status, education, cultural background, and locality - under both neutral and disability‑aware conditions. Across a varied set of prompts, models deliver a definitive demographic guess in up to 97\% of cases, exposing a strong tendency to make arbitrary inferences with no clear justification. Disability context heavily shifts predicted attribute distributions, and domain context can further amplify these deviations. We observe that larger models are simultaneously more sensitive to disability cues and more prone to biased reasoning, indicating that scale alone does not mitigate stereotype amplification. Our findings reveal persistent intersections between ableism and other demographic stereotypes, pinpointing critical blind spots in current alignment strategies. We release our evaluation framework and results to encourage disability‑inclusive benchmarking and recommend integrating abstention calibration and counterfactual fine‑tuning to curb unwarranted demographic inference. Code and data will be released on acceptance.
Submission Number: 4
Loading