On The Effectiveness of Gender Debiasing Methods in Removing Gender Information From Model RepresentationsDownload PDF

Anonymous

16 Oct 2022 (modified: 05 May 2023)ACL ARR 2022 October Blind SubmissionReaders: Everyone
Keywords: Natural Language Processing, Fairness, Bias, Gender Equality, Transformer Models, Bert
Abstract: Large pre-trained models such as BERT have been shown to demonstrate biased behavior towards different demographic groups, such as gender, race, or religion. Despite the development and proposal of various debiasing methods, there is a paucity of prior research focusing on the efficacy of debiasing methods in removing the latent demographic information encoded in internal representations. We examine the effectiveness of some recent bias mitigation methods in removing stereotypical gender information from internal model representations using Minimum Description Length (MDL) probing. We discover that the effectiveness of current debiasing techniques might not necessarily be indicative of reduced latent gender bias in representations. Furthermore, we investigate the effect of debiasing methods on internal representations using layerwise probing, showing that they tend to concentrate gender information in a few layers. We additionally apply a number of state-of-the-art debiasing methods to the layers with the highest gender information concentration, finding that by focusing on these layers, there is only a minimal change in model behavior with respect to fairness and performance.
Paper Type: long
Research Area: Ethics, Bias, and Fairness
0 Replies

Loading