Debiasing Pretrained Text Encoders by Paying Attention to Paying AttentionDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: Fairness, Pretrained Text Encoders, Self-Attention, Knowledge Distillation, Social Biases, Debiasing
Abstract: Recent studies in fair Representation Learning have observed a strong inclination for natural language processing (NLP) models to exhibit discriminatory stereotypes across gender, religion, race and many such social constructs. In comparison to the progress made in reducing bias from static word embeddings, fairness in sentence-level text encoders received little consideration despite their wider applicability in contemporary NLP tasks. In this paper, we propose a debiasing method for pre-trained text encoders that both reduces social stereotypes, and inflicts next to no semantic offset. Unlike previous studies that directly manipulate the embeddings, we suggest to dive deeper into the operation of these encoders, and pay more attention to the way they pay attention to different social groups. We find that the attention mechanism is the root of all stereotypes. Then, we work on model debiasing by redistributing the attention scores of a text encoder such that it forgets any preference to historically advantaged groups, and attends to all social classes with the same intensity. Our experiments confirm that we successfully reduce bias with little damage to semantic representation.
One-sentence Summary: In this work, we reduce social biases encoded in transformer-based text encoders by equalizing their inner attention scores across social groups.
Supplementary Material: zip
11 Replies

Loading