Keywords: Social Bias, Large Language Model, In-Context Learning
Abstract: In-context learning (ICL) has proven to be adept at adapting large language models (LLMs) to downstream tasks without parameter updates, based on a few demonstration examples.
Prior work has found that the ICL performance is susceptible to the selection of examples in prompt and made efforts to stabilize it.
However, existing example selection studies ignore the ethical risks behind the examples selected, such as gender and race bias.
In this work, we first construct a new sentiment classification dataset, EEC-paraphrase, designed to better capture and evaluate the biases of LLMs.
Then, through further analysis, we discover that **1) example selection with high accuracy does not mean low bias; 2) example selection for ICL amplifies the biases of LLMs; 3) example selection contributes to spurious correlations of LLMs.**
Based on the above observations, we propose the ***Re**mind with **B**ias-aware **E**mbedding* (**ReBE**), which removes the spurious correlations through contrastive learning and obtains bias-aware embedding for LLMs based on prompt tuning.
Finally, we demonstrate that ReBE effectively mitigates biases of LLMs without significantly compromising accuracy and is highly compatible with existing example selection methods.*The implementation code is available at https://anonymous.4open.science/r/ReBE-1D04.*
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4606
Loading