HALF: Harm-Aware LLM Fairness Evaluation Aligned with Deployment

ACL ARR 2026 January Submission6531 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM fairness evaluation, bias evaluation, harm-aware aggregation, demographic bias, cross-domain evaluation, large language models, responsible AI
Abstract: Large language models (LLMs) are increasingly deployed across high-impact domains, from clinical decision support and legal analysis to hiring and education, making fairness and bias evaluation before deployment critical. However, existing evaluations lack grounding in real-world scenarios and do not account for differences in harm severity, for example, a biased decision in surgery should not be weighed the same as a stylistic bias in text summarization. To address this gap, we introduce HALF (Harm-Aware LLM Fairness), a deployment-aligned framework that assesses model bias in realistic applications and weighs outcomes by harm severity. HALF organizes evaluation datasets into harm tiers (Severe, Moderate, Mild) based on the specific task and bias type they measure. Harm severity is assigned using deployment-relevant criteria grounded in regulatory and risk assessment frameworks. Our evaluation across eight LLMs shows that (1) models are not consistently fair across datasets, (2) model size or performance does not guarantee fairness, and (3) reasoning models perform better in medical decision support but worse in education. We conclude that HALF exposes a clear gap between benchmark performance and model fairness.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: data ethics; model bias/fairness evaluation; model bias/unfairness mitigation; ethical considerations in NLP applications; transparency; policy and governance; reflections and critiques;
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: English
Submission Number: 6531
Loading