From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings

ACL ARR 2024 April Submission519 Authors

16 Apr 2024 (modified: 20 May 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Embeddings play a pivotal role in the efficacy of large language Models. They are the bedrock on which these models grasp contextual relationships and foster a more nuanced understanding of language, and consequently perform remarkably on a plethora of complex tasks that require a fundamental understanding of human language. Given that these embeddings themselves often reflect or exhibit bias, it stands to reason that these models may also inadvertently learn this bias. In this work, we build on the seminal previous work and propose \textit{DeepSoftDebias}, an algorithm that uses a neural network to perform 'soft debiasing'. We exhaustively evaluate this algorithm across a variety of SOTA datasets, accuracy metrics, and challenging NLP tasks. We find that \textit{DeepSoftDebias} outperforms the current state-of-the-art methods at reducing bias across gender, race and religion.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: Debiasing, Gender Bias, Racial Bias, Bias, LLM

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 519

Loading