Large Language Models Develop Novel Social Biases Through Adaptive Exploration

Large Language Models Develop Novel Social Biases Through Adaptive Exploration

ICLR 2026 Conference Submission13596 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Psychology, Cognitive Science, Fairness, Bias

Abstract: As large language models (LLMs) are adopted into frameworks that grant them capacities to make real decisions, the consequences of their social biases intensify. Yet, we argue that simply removing biases from models is not enough. Using a paradigm from the psychology literature, we demonstrate that LLMs can spontaneously develop novel social biases about artificial demographic groups even when no inherent differences exist. These biases lead to highly stratified task allocations, which are less fair than assignments by human participants and are exacerbated by newer and larger models. Emergent biases like these have been shown in the social sciences to result from exploration-exploitation trade-offs, where the decision-maker explores too little, allowing early observations to strongly influence impressions about entire demographic groups. To alleviate this effect, we examine a series of interventions targeting system inputs, problem structure, and explicit steering. We find that explicitly incentivizing exploration most robustly reduces stratification, highlighting the need to incorporate better multifaceted objectives to mitigate bias. These results reveal that LLMs are not merely passive mirrors of human social bias, but can actively create new ones from experience, raising urgent questions about how these systems will shape societies over time.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 13596

Loading