Abstract: Modern language models excel at factual rea-
soning but struggle with value diversity: the
multiplicity of plausible human perspectives.
Tasks such as hate speech or sexism detection
expose this limitation, where human disagree-
ment captures the diversity of perspectives that
models need to account for, rather than dataset
noise. In this paper, we explore whether multi-
perspective in-context learning (ICL) can align
large language models (LLMs) with this di-
versity without parameter updates. We eval-
uate four LLMs on five datasets across three
languages (English, Arabic, Italian), consid-
ering three label-space representations (aggre-
gated hard, disaggregated hard, and disaggre-
gated soft) and five demonstration selection
and ordering strategies. Our multi-perspective
approach outperforms standard prompting on
aggregated English labels, while disaggregated
soft predictions better align with human judg-
ments in Arabic and Italian datasets. These find-
ings highlight the importance of perspective-
aware LLMs for reducing bias and polarization,
while also revealing the challenges of apply-
ing ICL to socially sensitive tasks. We further
probe the model faithfulness using XAI, of-
fering insights into how LLMs handle human
disagreement.
Loading