Words Can Be Confusing: Stereotype Bias Removal in Text Classification at the Word Level

Shaofei Shen, Mingzhe Zhang, Weitong Chen, Alina Bialkowski, Miao Xu

Published: 2023, Last Modified: 24 Jul 2023PAKDD (4) 2023Readers: Everyone

Abstract: Text classification is a widely used task in natural language processing. However, the presence of stereotype bias in text classification can lead to unfair and inaccurate predictions. Stereotype bias is particularly prevalent in words that are unevenly distributed across classes and are associated with specific categories. This bias can be further strengthened in pre-trained models on large natural language datasets. Prior works to remove stereotype bias have mainly focused on specific demographic groups or relied on specific thesauri without measuring the influence of stereotype words on predictions. In this work, we present a causal analysis of how stereotype bias occurs and affects text classification, and propose a framework to mitigate stereotype bias. Our framework detects potential stereotype bias words using SHAP values and alleviates bias in the prediction stage through a counterfactual approach. Unlike existing debiasing methods, our framework does not rely on existing stereotype word sets and can dynamically evaluate the influence of words on stereotype bias. Extensive experiments and ablation studies show that our approach effectively improves classification performance while mitigating stereotype bias.

0 Replies