Abstract: Natural language processing models learn powerful language representation abilities from vast amounts of data, but they also inherit societal biases embedded in that data. Current research on debiasing often struggles to balance the removal of model bias with the preservation of model performance. Most existing approaches depend on fine-tuning model parameters, which can introduce uncertainties in model performance due to the modifications made to these parameters. In this paper, we propose a novel debiasing framework called FairTriplet. First, this framework employs prefix tuning to freeze the parameters of the original pre-trained model. Then, it optimizes the prefix parameters through two debiasing terms. These two debiasing terms function by reducing the semantic distance between social groups (e.g., male and female) and increasing the semantic distance between social groups and neutral attributes (e.g., family and occupation) in the semantic space. This approach not only removes bias from the model but also preserves its performance. Experimental results demonstrate that FairTriplet achieves state-of-the-art (SOTA) levels in debiasing while maintaining model performance on GLUE downstream tasks.
Loading