On the interpretability and significance of bias metrics in texts: a PMI-based approach

Anonymous

On the interpretability and significance of bias metrics in texts: a PMI-based approach

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone

Abstract: In recent years, the use of word embeddings has become popular to measure the presence of biases in texts. Despite the fact that these measures have been proven to be effective in detecting a wide variety of biases, metrics based on word embeddings lack transparency, explainability and interpretability. In this study, we propose a PMI-based metric to quantify biases in texts. We prove that this metric can be approximated by an odds ratio, which allows estimating the confidence interval and statistical significance of textual bias. This PMI-based measure can be expressed as a function of conditional probabilities, providing a simple interpretation in terms of word co-occurrences. Our approach produces a performance comparable to GloVe-based and skip-gram-based metrics in experiments of gender-occupation and gender-name associations. We discuss the advantages and disadvantages of using methods based on first-order vs second-order co-occurrences, from the point of view of the interpretability of the metric and the sparseness of the data.

0 Replies

Loading