Textual Entailment is not a Better Bias Metric than Token Probability

Textual Entailment is not a Better Bias Metric than Token Probability

ACL ARR 2026 January Submission10753 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: fairness, bias, bias metrics, bias evaluation

Abstract: Measurement of social bias in language models is typically by token probability (TP) metrics, which are broadly applicable but have been criticized for their distance from real-world language model use cases and harms. In this work, we test natural language inference (NLI) as a more realistic alternative bias metric. In extensive experiments across seven LM families, we show that NLI and TP bias evaluation behave substantially differently, with very low correlation among different NLI metrics and between NLI and TP metrics. NLI metrics are more brittle and unstable, slightly less sensitive to wording of counterstereotypical sentences, and slightly more sensitive to wording of tested stereotypes than TP approaches. Given this conflicting evidence, we conclude that neither token probability nor natural language inference is a "better" bias metric in all cases. We do not find sufficient evidence to justify NLI as a complete replacement for TP metrics in bias evaluation.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: model bias/fairness evaluation

Contribution Types: Model analysis & interpretability, Data resources, Data analysis

Languages Studied: English

Submission Number: 10753

Loading