- TL;DR: We propose a neural bias annotator to benchmark models on their robustness to biased text datasets.
- Abstract: In a time where neural networks are increasingly adopted in sensitive applications, algorithmic bias has emerged as an issue with moral implications. While there are myriad ways that a system may be compromised by bias, systematically isolating and evaluating existing systems on such scenarios is non-trivial, i.e., bias may be subtle, natural and inherently difficult to quantify. To this end, this paper proposes the first systematic study of benchmarking state-of-the-art neural models against biased scenarios. More concretely, we postulate that the bias annotator problem can be approximated with neural models, i.e., we propose generative models of latent bias to deliberately and unfairly associate latent features to a specific class. All in all, our framework provides a new way for principled quantification and evaluation of models against biased datasets. Consequently, we find that state-of-the-art NLP models (e.g., BERT, RoBERTa, XLNET) are readily compromised by biased data.
- Keywords: bias, natural language processing, text classification, natural language inference