Abstract: Fine-tuned models have been shown to reproduce underlying biases existing in their training data, which is by default majority perspective. While this process has been shown to minimise minority perspectives, proposed solutions either fail to preserve nuances of the original data, or are based on strong a-priori assumptions about annotators that when used bias model training. We propose an approach that trains models purely in an annotator demographic-agnostic manner, extracts latent embeddings informed by annotator behaviour during training, and clusters annotators based on their behaviour over the respective corpus. Resulting clusters are subsequently validated post-hoc via internal and external validative quantitative metrics, as well as our resulting qualitative analysis. Our results explain the strong generalisation capability of our framework, indicated by resulting clusters being adequately robust, while also capturing minority perspectives based on different demographic factors throughout two distinct datasets.
Paper Type: long
Research Area: Ethics, Bias, and Fairness
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
0 Replies
Loading