Keywords: fairness, bias, bias metrics, bias evaluation
Abstract: Measurement of social bias in language models is typically by token probability (TP) metrics, which are broadly applicable but have been criticized for their distance from real-world language model use cases and harms. In this work, we test natural language inference (NLI) as a more realistic alternative bias metric. In extensive experiments across seven LM families, we show that NLI and TP bias evaluation behave substantially differently, with very low correlation among different NLI metrics and between NLI and TP metrics. NLI metrics are more brittle and unstable, slightly less sensitive to wording of counterstereotypical sentences, and slightly more sensitive to wording of tested stereotypes than TP approaches. Given this conflicting evidence, we conclude that neither token probability nor natural language inference is a "better" bias metric in all cases. We do not find sufficient evidence to justify NLI as a complete replacement for TP metrics in bias evaluation.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: model bias/fairness evaluation
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: English
Submission Number: 10753
Loading