Debiasing Language Models Using Energy-Guided Ordinary Differential Equations

Mingzhe Du; Anh Tuan Luu; Bin Ji; See-Kiong Ng

Debiasing Language Models Using Energy-Guided Ordinary Differential Equations

Mingzhe Du, Anh Tuan Luu, Bin Ji, See-Kiong Ng

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Language Model, Debiasing, Ordinary Differential Equation, Energy-based Model

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: This paper proposes a new method to reduce LM biases in a continuous latent space, resulting in debiased generations without sacrificing semantic content.

Abstract: Language Models (LMs) excel in learning from training datasets. However, they often inadvertently incorporate societal biases within the data they draw from, raising fairness concerns in their applications. In response, this paper introduces a novel method to reduce such biases. Our approach leverages the Energy-Based Model (EBM) gradient to navigate Ordinary Differential Equations (ODEs) sampling within a latent space. Firstly, we create a latent space and link it with text space in LMs through efficient tuning. Then, we train classifiers in this space that discriminate certain bias attributes. By integrating these classifiers into an EBM frame, we use the EBM gradient to gradually steer the ODE solver in choosing less-biased samples from the latent space. Finally, the LM decodes the latent sample back into the text space, thus generating debiased output across multiple attributes. The preliminary evaluation demonstrates that our method successfully decreases joint bias while retaining essential semantic content, representing a promising step towards more equitable LMs.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6103

Loading