A Retrieval Augmentation Approach for Aligning to Pluralistic Values

ACL ARR 2024 June Submission4893 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Aligning LLM outputs to human preferences and values is important for reducing harms of AI deployments. However, human values are pluralistic with different population groups and communities having potentially conflicting preferences. Existing fine-tuning and prompting approaches have primarily focused around alignment towards shared values. In this paper, we propose a new approach for pluralistic alignment that uses retrieval-based in-context examples to augment alignment prompts. We introduce a framework, SPICA, consisting of three components to facilitate this: ``scenario banks'', group-informed retrieval measures, and contrastive prompts. We evaluate SPICA with human participants reflecting groups with different values, and find that SPICA outperforms relevance metrics like semantic similarity, selecting few-shot examples that better match group preferences (22.1\% lower RMSE). In an end-to-end setting, we also find that SPICA produces more preferable responses when explicitly aligning to group preferences (+0.07 / 5-point scale).
Paper Type: Long
Research Area: Human-Centered NLP
Research Area Keywords: value-centered design, human-in-the-loop, participatory/community-based NLP
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 4893
Loading