Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Language Model, Debiasing, Ordinary Differential Equation, Energy-based Model
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: This paper proposes a new method to reduce LM biases in a continuous latent space, resulting in debiased generations without sacrificing semantic content.
Abstract: Language Models (LMs) excel in learning from training datasets. However, they often inadvertently incorporate societal biases within the data they draw from, raising fairness concerns in their applications. In response, this paper introduces a novel method to reduce such biases. Our approach leverages the Energy-Based Model (EBM) gradient to navigate Ordinary Differential Equations (ODEs) sampling within a latent space. Firstly, we create a latent space and link it with text space in LMs through efficient tuning. Then, we train classifiers in this space that discriminate certain bias attributes. By integrating these classifiers into an EBM frame, we use the EBM gradient to gradually steer the ODE solver in choosing less-biased samples from the latent space. Finally, the LM decodes the latent sample back into the text space, thus generating debiased output across multiple attributes. The preliminary evaluation demonstrates that our method successfully decreases joint bias while retaining essential semantic content, representing a promising step towards more equitable LMs.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6103
Loading