Surface Fairness, Deep Bias: Quantifying Epistemic Injustice in the Clinical Reasoning of Large Language Models

Surface Fairness, Deep Bias: Quantifying Epistemic Injustice in the Clinical Reasoning of Large Language Models

ACL ARR 2026 January Submission4494 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: bias detection, medical application, LLM

Abstract: Large language models (LLMs) have recently achieved remarkable performance on medical benchmarks, leading to their increasing deployment in clinical decision support and patient consultation systems. However, LLMs trained on real-world corpora inevitably inherit latent societal biases, particularly gender biases prevalent in clinical practice, which can perpetuate inequities and threaten patient safety. Existing bias evaluations of LLMs in the medical domain primarily focus on surface-level disparities in the final results, overlooking subtler biases embedded in the models' reasoning processes. To bridge this gap, we propose Clinical Audit for Reasoning Equity ($\textbf{CARE}$), a multi-dimensional evaluation framework designed to detect latent epistemic injustice in LLMs. CARE moves beyond accuracy metrics to audit reasoning trajectories through three complementary lenses: outcome metrics, counterfactual semantic drift and double-stage Chain-of-Thought (CoT) audit. To support this evaluation, we introduce the $\textbf{MedFair-CF}$ Dataset, a strictly controlled counterfactual benchmark comprising 23,096 samples across five clinical specialties, derived from over 500,000 medical records. Our experiments on state-of-the-art (SOTA) LLMs reveal that even when surface-level predictions appear consistent, models exhibit significant semantic biases in multiple dimensions, including diagnostic confidence, symptom attribution, and logical transitions. Crucially, we identify that these implicit biases are not driven by reduced reasoning effort, but rather by the activation of specific stereotype heuristics. These findings provide new insights for guiding the development of more equitable and safer language models.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: model bias/fairness evaluation

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: English, Chinese

Submission Number: 4494

Loading