Debiasing Transformer Models through Weight Masking: Addressing Gender Confounding Shift in Dementia Detection

ACL ARR 2024 June Submission4800 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep language models are often described as "black-box" systems due to their opaque inference procedures. This presents a challenge in understanding the information they capture, and how it is encoded within transformer networks, raising the possibility that encoded biases may remain undetected. This work addresses confounding bias learned during model fine-tuning, when a pretrained language model is adapted to downstream domains and tasks. Building on previous methodologies, we extend them by proposing the Extended Confounding Filter and the Dual Filter. These methods aim to isolate and address weights within the transformer network that are associated with confounding variables through distinct training phases. We evaluate these methods on the \textit{DementiaBank} dataset, a first-person narrative dataset that contains language of patients with cognitive impairment and healthy controls. We aim to demonstrate the applicability of the proposed methods in the domain of dementia detection as a means to correct for gender-related disparities in class distribution at training time. Our results show that transformer models can overfit to the subpopulation distribution in the training data. By disrupting the weights associated with known confounders, we show that fairer models can be achieved with reduced prediction bias towards specific subgroups. Moreover, our findings highlight resilience of the model against weights deletion and show a trade-off between model performance in dementia detection and the reduction of disparities across gender groups.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: Confounding Shift, Weight Masking, Fairness
Contribution Types: Model analysis & interpretability, Reproduction study, Publicly available software and/or pre-trained models, Data analysis
Languages Studied: English
Submission Number: 4800
Loading