Abstract: Large Language Models (LLMs) perpetuate social biases, reflecting prejudices in their training data and reinforcing societal stereotypes and inequalities. Our work explores the potential of the Contact Hypothesis from social psychology for debiasing LLMs. We simulate various forms of social contact through LLM prompting to measure its influence on the model's biases, similar to how intergroup interactions can reduce prejudices in social contexts. We create a dataset of 108,000 prompts following a principled approach replicating social contact to measure biases in three LLMs (Llama 2, Tulu, and NousHermes) across 13 social bias dimensions. We propose a unique debiasing technique, Social Contact Debiasing (SCD), that instruction tunes these models with unbiased responses to prompts. Our research demonstrates that LLMs do indeed exhibit social biases, but more importantly, these biases can be significantly reduced by up to 40% in 1 epoch of instruction tuning Llama 2 following our SCD strategy.
Paper Type: long
Research Area: Ethics, Bias, and Fairness
Contribution Types: Model analysis & interpretability, Data resources, Theory
Languages Studied: English
0 Replies
Loading